Skip to main content

Python library to find missing files in a scheduled delivery.

Project description

filehole

Python library to find missing files in a scheduled delivery.

Simple and quick solution to help finding missing files in a scheduled delivery, particularly when dealing with large amount of files or a long history of file delivery.

Dependencies

  • Python 3.8.9 or higher
  • Numpy 1.23.0
  • Holidays 0.14.2

Install

The latest stable version can always be installed or updated via pip:

$ pip install filehole

Usage

filehole(
    path_to_files: str,
    file_system: Globable,
    date_pattern: str,
    date_format: str,
    country: str,
    subdivision: str = None,
    start_date: str = f"{date.today().year}-01-01",
    end_date: str = date.today().strftime("%Y-%m-%d"),
    week_schedule: str = "1111100",
    frequency: str = "D",
    repetition: int = 1,
    position: int = 1,
)

Parameters:

  • path_to_files : Wild card enabled string to search for files
  • file_system : Modules that have a glob function such as glob in a local environment or adls in a cloud environment.
  • date_pattern : Regular expression reflecting the pattern in which the date is written in files or directories.
  • date_format : Standard date format of the date written in files or directories.
  • country : Country name or abbreviation for the selection of the holidays calendar. For the exhaustive list of available holidays calendars, please refer to the documentation of the holidays python library (https://pypi.org/project/holidays/).
  • subdivision : Province, state, ... for the selection of the holidays calendar. The available option can be found in the documentation of the holidays python library (https://pypi.org/project/holidays/).
  • start_date : Start of the search period. Format: '%Y-%m-%d'. Default is set to the first day of the current year.
  • end_date : End of the search period. Format: '%Y-%m-%d'. Default is set to the current date.
  • week_schedule: String of 7 digits of 0 and 1. 1 represents a working day and 0 a non-working day. Week starts on Monday. By default, the working week is set from Monday to Friday included -> '1111100'.
  • frequency : Takes 'D' for daily delivery, 'W' for weekly delivery and 'M' for monthly delivery.
  • repetition : Default value: 1. Used only for weekly and monthly file delivery. e.g.: repetition=1 -> every week/month, repetition=2 -> every two weeks/months...
  • position : Takes 1 for first business day of the month or -1 for last business day of the month.

Description:

Retrieve list of files from a given location.
Extract dates from filenames.
Create a calendar of holidays.
Create a set of expected dates and compare them to the extracted dates.
Return a set of missing dates.

Example:

  • Daily file delivery for the month of July according to the french holiday calendar, assuming that files from the 12 and 13 of July are missing:
> filehole(
        path_to_files="my_file_path/*.txt",  
        file_system=glob,
        date_pattern=r"[0-9]{8}",
        date_format="%Y%m%d",
        country="FR",
        start_date="2022-07-01",
        end_date="2022-07-31",
        week_schedule="1111100",
        frequency="D",
    )
> {datetime.date(2022, 7, 12), datetime.date(2022, 7, 13)}

Limitations

  • All files are expected to be at the same level.
  • The files or the directory containing the files should contain a date in their name.
  • Current version works only with daily, weekly and monthly file delivery.

License

This project is under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filehole-0.0.4.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

filehole-0.0.4-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file filehole-0.0.4.tar.gz.

File metadata

  • Download URL: filehole-0.0.4.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.9

File hashes

Hashes for filehole-0.0.4.tar.gz
Algorithm Hash digest
SHA256 1f6a488cae9460fcb981b2864e1b8b246ea9a35d1ee40ee6418cdf98274458ea
MD5 f045acca01e6bc5acd4da3c669a58030
BLAKE2b-256 aba30bd45a549cd5c7dad7f0ee71b24454c04020c61a3ccd4d8c7c8dfcf6a239

See more details on using hashes here.

File details

Details for the file filehole-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: filehole-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.9

File hashes

Hashes for filehole-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 86a8232d806a5cb7ce5ced9568f000b812a6641d74c1f6ecefa8a1f993a416fc
MD5 3ca23b135cbde150c77af88ff3a5993b
BLAKE2b-256 a1c1e330e46dd0632dd1e61d05928283649fbff1e7aee0ef206733b10041c588

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page