Universal, extensible Python library for extracting structured information (groups, dates, times, custom patterns) from file names and paths.

These details have not been verified by PyPI

Project links

Project description

FilePathParser

Universal, extensible Python library for extracting structured information (groups, dates, times, custom patterns) from file names and paths.

No hardcoded logic: you choose any number of groups (lists, enums, dicts, strings).
Automatic date and time search (many formats supported and validated).
Unlimited custom patterns: add your own regex groups.
Configurable priority: filename or path takes precedence.
Supports str and pathlib.Path.
Returns None if not found or not valid.
Custom patterns (cam\d+, count\d+) automatically return only the number (e.g. "cam15" → "15").

Installation
Supported Date and Time Formats
Usage Examples
API Reference
How It Works
Notes
Command-Line Interface (CLI)
Contributing
Project Board
FAQ / Known Issues
About PatternMatcher.find_special
Author
License

Installation

pip install file_path_parser

Supported Date and Time Formats

Date examples:

20240622 (YYYYMMDD)
2024-06-22 (YYYY-MM-DD)
2024_06_22 (YYYY_MM_DD)
22.06.2024 (DD.MM.YYYY)
22-06-2024 (DD-MM-YYYY)
220624 (YYMMDD)
2024-6-2, 2024_6_2

Time examples:

154212 (HHMMSS)
1542 (HHMM)
15-42-12 (HH-MM-SS)
15_42_12 (HH_MM_SS)
15-42, 15_42 (HH-MM, HH_MM)

All dates and times are validated. E.g. "20241341" is not a date; "246199" is not a time.

Usage Examples

1. Lists and Tuples as Groups

from file_path_parser import FilePathParser

animals = ["cat", "dog"]
shifts = ("night", "day")
departments = {"department": ["prod", "dev", "test"]}

parser = FilePathParser(
    animals,
    shifts,
    departments,
    date=True,
    time=True,
    patterns={"cam": r"cam\d{1,2}"}
)

result = parser.parse("cat_night_dev_cam08_20240622_1542.jpg")
print(result)
# {
#   "group1": "cat",
#   "group2": "night",
#   "department": "dev",
#   "date": "20240622",
#   "time": "1542",
#   "cam": "8"
# }

2. Enum as Groups

from file_path_parser import FilePathParser
from enum import Enum

class Shift(Enum):
    NIGHT = "night"
    DAY = "day"

class Animal(Enum):
    CAT = "cat"
    DOG = "dog"

parser = FilePathParser(
    Animal,
    Shift,
    date=True,
    time=True,
    patterns={"cam": r"cam\d{1,2}"}
)

result = parser.parse("dog_day_cam12_2024-06-23_1730.avi")
print(result)
# {
#   "animal": "dog",
#   "shift": "day",
#   "date": "2024-06-23",
#   "time": "1730",
#   "cam": "12"
# }

3. Dictionary as Group

from file_path_parser import FilePathParser

departments = {"department": ["it", "finance", "marketing"]}
levels = {"level": ("junior", "middle", "senior")}
flags = {"flag": "urgent"}

parser = FilePathParser(
    departments,
    levels,
    flags,
    date=True,
    patterns={"ticket": r"T\d{3,5}"}
)

result = parser.parse("finance_senior_urgent_T1004_20240601.txt")
print(result)
# {
#   "department": "finance",
#   "level": "senior",
#   "flag": "urgent",
#   "date": "20240601",
#   "ticket": "1004"
# }

4. Mixed Groups: Enum, List, Custom Patterns, Date, and Time

from file_path_parser import FilePathParser
from enum import Enum

class Status(Enum):
    OPEN = "open"
    CLOSED = "closed"

parser = FilePathParser(
    Status,
    ["alpha", "beta"],
    date=True,
    time=True,
    patterns={"session": r"session\d+"}
)

result = parser.parse("beta_open_session27_2023-12-31_2359.txt")
print(result)
# {
#   "status": "open",
#   "group2": "beta",
#   "date": "2023-12-31",
#   "time": "2359",
#   "session": "27"
# }

5. Only Custom Patterns and Date/Time

from file_path_parser import FilePathParser

parser = FilePathParser(
    date=True,
    time=True,
    patterns={"id": r"id\d+", "batch": r"batch\d{2,4}"}
)

result = parser.parse("id99_batch012_20240701_1430.log")
print(result)
# {
#   "date": "20240701",
#   "time": "1430",
#   "id": "99",
#   "batch": "012"
# }

6. If both the path and filename contain a group or date, the value from the priority parameter wins.

from file_path_parser import FilePathParser

parser = FilePathParser(["prod", "test"], date=True, priority="filename")
# 'prod' есть в пути, 'test' — в имени файла
result = parser.parse("/data/prod/archive/test_20240620.csv")
print(result)
# Если priority="filename", group1 == "test"
# Если priority="path", group1 == "prod"

Custom Pattern Number Extraction

When you provide a custom pattern like "cam\d+" or "count\d+", the parser automatically extracts only the numeric part (e.g., "cam15" → "15").
You don't need to manually add parentheses around the digits: the parser will do it for you!

If you provide an explicit capture group (e.g., "cam(\d+)"), the parser will use your group as-is.

Example

parser = FilePathParser(patterns={"cam": r"cam\d+", "count": r"count\d+"})
result = parser.parse("cam15_count123.txt")
print(result)
# {'cam': '15', 'count': '123'}

If you want to capture a more complex value, you can use your own group:

parser = FilePathParser(patterns={"cam": r"camA(\d+)"})
result = parser.parse("camA15_B.txt")
print(result)
# {'cam': '15'}

API Reference

class FilePathParser:
    def __init__(
        *groups: Any,        # Any number of lists, enums, dicts, or strings (group name auto-detected)
        date: bool = False,  # Extract date? (default: False)
        time: bool = False,  # Extract time? (default: False)
        separator: str = "_",
        priority: str = "filename", # or "path"
        patterns: dict = None,      # e.g. {"cam": r"cam\d+"}
    )

    def parse(self, full_path: Union[str, Path]) -> dict:
        """
        Returns a dict {group: value or None, ...}.
        """

Group name is auto-generated:
- Enum: lowercase enum class name.
- Dict: key as group name.
- List/tuple/set: groupN (N = order of argument).
- String: value as group name.
If group not found or invalid: returns None for that group.
Date and time always validated (returns None if not real date/time).

How It Works

Splits filename and path into “blocks” (by _, -, ., /, etc).
For each group, tries to find an exact match (for enums, lists, dicts).
For date and time:
- Matches all supported formats via regex.
- Validates with datetime.strptime.
For custom patterns:
- If your regex is like "cam\d+", "count\d{2,4}", the parser returns only the digits.
- If you want the full match, provide an explicit capture group, e.g. "label(\d+)".

If both path and filename have a group, the value from priority wins.

Notes

Group name in the result will be None if not found or not valid.
If both path and filename have the group, value from priority wins.
You can use any number of groups or patterns — no hard limit.

Command-Line Interface (CLI) for FilePathParser

The library supports a convenient command-line interface (CLI) for extracting structured information from file names and paths.

🚀 Quick Start

After installing dependencies with Poetry, you can use the file-path-parser utility to parse file names directly from your terminal.

Example usage

poetry run file-path-parser "cat_night_cam15_20240619_1236.jpg" --groups cat dog --classes night day --date --time --pattern cam "cam\d{1,3}"

Show help

poetry run file-path-parser --help

CLI Options

filepath — Path or file name to parse
--groups — List of allowed groups (e.g. cat dog)
--classes — List of allowed classes (e.g. night day)
--date — Enable date parsing
--time — Enable time parsing
--pattern NAME REGEX — Add custom pattern (can be used multiple times)

Example

poetry run file-path-parser "dog_day_cam2_20240701_0800.jpg" --groups cat dog --classes night day --date --time --pattern cam "cam\d{1,3}"

The parsing result will be displayed in the terminal.

Contributing

Pull requests, bug reports and feature requests are welcome!

Project Board

All ongoing development, task tracking, and planning for this library is managed in the Project Board.

See what's in progress, planned, or completed
Follow the roadmap and feature development
Suggest improvements or report issues via Issues, which are linked directly to the board

Visit the Project Board →

FAQ / Known Issues

Q: My pattern is `"cam\d+"` — why does the result return only the number?

A: For user convenience, the parser automatically extracts only the digits from patterns like "cam\d+" or "count\d+".
If you want the full match, use a custom capture group: "cam(\d+)".

Q: What happens if both the path and filename contain the same group, but with different values?

A: The result depends on the priority parameter:

If priority="filename" (default), the group value from the filename wins.
If priority="path", the value from the directory path wins.

Q: Can I use non-Latin or Unicode characters in group values?

A: Yes. Groups and blocks are matched in a case-insensitive way and support Unicode.

Q: What separators does the parser recognize between blocks?

A: By default, the parser splits by any of these: _, -, ., /, \, {}, or space. If your files use custom separators, let us know!

Q: What if a value looks like a date/time, but is not real?

A: The parser validates all dates/times. "20241341" (wrong month/day) will not be recognized as a date, etc.

Known Issues

If your separator is unusual (not in the list above), you may need to pre-process filenames.
Extremely exotic date/time formats (not listed in "Supported formats") are not matched.
Path parsing supports both str and pathlib.Path, but network/multiplatform paths (e.g., UNC, SMB) are not specifically tested.

About PatternMatcher.find_special

Note:
The method PatternMatcher.find_special() is currently not used in the main library code.
It exists as a universal interface for dynamic field extraction by key (date, time, or any custom pattern) and may be useful for advanced integrations, future extensions, or dynamic user queries.

Author

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.5

Jun 24, 2025

0.1.3

Jun 22, 2025

This version

0.1.2

Jun 22, 2025

0.1.1

Jun 21, 2025

0.1.0

Jun 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

file_path_parser-0.1.2.tar.gz (14.6 kB view details)

Uploaded Jun 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

file_path_parser-0.1.2-py3-none-any.whl (13.4 kB view details)

Uploaded Jun 22, 2025 Python 3

File details

Details for the file file_path_parser-0.1.2.tar.gz.

File metadata

Download URL: file_path_parser-0.1.2.tar.gz
Upload date: Jun 22, 2025
Size: 14.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.10.11 Windows/10

File hashes

Hashes for file_path_parser-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`0715dfacca1f0006da5da12b08949af6c8e5e0667fed2bc4b5e8c27766d7a945`
MD5	`6ad950519c203443916768bf90a42053`
BLAKE2b-256	`68056908dbfc12e22186ae8560e2d3cbd4ebb914a6a9e97bf0a72ebd87bec542`

See more details on using hashes here.

File details

Details for the file file_path_parser-0.1.2-py3-none-any.whl.

File metadata

Download URL: file_path_parser-0.1.2-py3-none-any.whl
Upload date: Jun 22, 2025
Size: 13.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.10.11 Windows/10

File hashes

Hashes for file_path_parser-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a981add10a87e2004ee8e9db10e71fe0e2b7339300a8c18767a69b58838fd2ce`
MD5	`bd78bf88b603b66939a1ce6fb842beae`
BLAKE2b-256	`1e2498b512a9e286547cc381ea31d42bd4012d2caf6065433a56fc493099a6cd`

See more details on using hashes here.

file-path-parser 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

FilePathParser

Table of Contents

Installation

Supported Date and Time Formats

Usage Examples

1. Lists and Tuples as Groups

2. Enum as Groups

3. Dictionary as Group

4. Mixed Groups: Enum, List, Custom Patterns, Date, and Time

5. Only Custom Patterns and Date/Time

6. If both the path and filename contain a group or date, the value from the priority parameter wins.

Custom Pattern Number Extraction

Example

API Reference

How It Works

Notes

Command-Line Interface (CLI) for FilePathParser

🚀 Quick Start

Example usage

Show help

CLI Options

Example

Contributing

Project Board

FAQ / Known Issues

Q: My pattern is "cam\d+" — why does the result return only the number?

Q: What happens if both the path and filename contain the same group, but with different values?

Q: Can I use non-Latin or Unicode characters in group values?

Q: What separators does the parser recognize between blocks?

Q: What if a value looks like a date/time, but is not real?

Known Issues

About PatternMatcher.find_special

Author

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Q: My pattern is `"cam\d+"` — why does the result return only the number?