Skip to main content

Python library to find missing files in a scheduled delivery.

Project description

filehole

Python library to find missing files in a scheduled delivery.

Simple and quick solution to help finding missing files in a scheduled delivery, particularly when dealing with large amount of files or a long history of file delivery.

Dependencies

  • Python 3.8.9 or higher
  • Numpy 1.23.0
  • Holidays 0.14.2

Install

The latest stable version can always be installed or updated via pip:

$ pip install filehole

Usage

filehole(
    path_to_files: str,
    file_system: Globable,
    date_pattern: str,
    date_format: str,
    country: str,
    subdivision: str = None,
    start_date: str = f"{date.today().year}-01-01",
    end_date: str = date.today().strftime("%Y-%m-%d"),
    week_schedule: str = "1111100",
    frequency: str = "D",
    repetition: int = 1,
    position: int = 1,
)

Parameters:

  • path_to_files : Wild card enabled string to search for files
  • file_system : Modules that have a glob function such as glob in a local environment or adls in a cloud environment.
  • date_pattern : Regular expression reflecting the pattern in which the date is written in files or directories.
  • date_format : Standard date format of the date written in files or directories.
  • country : Country name or abbreviation for the selection of the holidays calendar. For the exhaustive list of available holidays calendars, please refer to the documentation of the holidays python library (https://pypi.org/project/holidays/).
  • subdivision : Province, state, ... for the selection of the holidays calendar. The available option can be found in the documentation of the holidays python library (https://pypi.org/project/holidays/).
  • start_date : Start of the search period. Format: '%Y-%m-%d'. Default is set to the first day of the current year.
  • end_date : End of the search period. Format: '%Y-%m-%d'. Default is set to the current date.
  • week_schedule: String of 7 digits of 0 and 1. 1 represents a working day and 0 a non-working day. Week starts on Monday. By default, the working week is set from Monday to Friday included -> '1111100'.
  • frequency : Takes 'D' for daily delivery, 'W' for weekly delivery and 'M' for monthly delivery.
  • repetition : Default value: 1. Used only for weekly and monthly file delivery. e.g.: repetition=1 -> every week/month, repetition=2 -> every two weeks/months...
  • position : Takes 1 for first business day of the month or -1 for last business day of the month.

Description:

Retrieve list of files from a given location.
Extract dates from filenames.
Create a calendar of holidays.
Create a set of expected dates and compare them to the extracted dates.
Return a set of missing dates.

Example:

  • Daily file delivery for the month of July according to the french holiday calendar, assuming that files from the 12 and 13 of July are missing:
> filehole(
        path_to_files="my_file_path/*.txt",  
        file_system=glob,
        date_pattern=r"[0-9]{8}",
        date_format="%Y%m%d",
        country="FR",
        start_date="2022-07-01",
        end_date="2022-07-31",
        week_schedule="1111100",
        frequency="D",
    )
> {datetime.date(2022, 7, 12), datetime.date(2022, 7, 13)}

Limitations

  • All files are expected to be at the same level.
  • The files or the directory containing the files should contain a date in their name.
  • Current version works only with daily, weekly and monthly file delivery.

License

This project is under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filehole-0.0.4.tar.gz (4.8 kB view hashes)

Uploaded source

Built Distribution

filehole-0.0.4-py3-none-any.whl (5.2 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page