Skip to main content

Intake plugin for specifying a file-path pattern which can represent a number of different entries

Project description

Intake Pattern Catalog

intake-pattern-catalog is a plugin for Intake which allows you to specify a file-path pattern which can represent a number of different entries.

Note that this is different from the patterns you can write with the csv driver which get turned into a single entry

Installation instructions

pip install intake-pattern-catalog
# or
conda install intake-pattern-catalog

Usage

Use driver: pattern_cat to use this driver in your catalogs.

Consider the following list of files in an S3 bucket:

  • bucket-name/folder/a_1.csv
  • bucket-name/folder/b_1.csv
  • bucket-name/folder/c_1.csv
  • bucket-name/folder/a_2.csv
  • bucket-name/folder/b_2.csv

And the following catalog definition yaml file:

---
metadata:
  version: 1
sources:
  stuff:
    description: Stuff and things
    driver: pattern_cat
    args:
      urlpath: "s3://bucket-name/folder/{foo}_{bar}.csv"
      driver: csv

Derived datasets

If you would like to create a derived dataset based on a pattern_cat dataset, you can use driver: pattern_cat_transform, which will apply a transformation function to each entry returned by get_entry. For example, you can add to the above example yaml file:

  stuff_transformed:
    description: Everything in stuff, doubled
    driver: pattern_cat_transform
    args:
      targets:
        - stuff
      transform: "path.to.doubling_function"

Catalog API

Access entry by kwargs:

> catalog.stuff.get_entry(foo='a', bar=1)
sources:
  foo_a_bar_1:
    args:
      storage_options:
        use_listings_cache: false
      urlpath: s3://bucket-name/folder/a_1.csv
    description: ''
    driver: intake.source.csv.CSVSource
    metadata:
      catalog_dir: ...

Note that this could also be accessed with catalog.stuff.foo_a_bar_1

See all valid kwarg combinations:

> catalog.stuff.get_entry_kwarg_sets()
[
    {"foo": "a", "bar": "1"},
    {"foo": "b", "bar": "1"},
    {"foo": "c", "bar": "1"},
    {"foo": "a", "bar": "2"},
    {"foo": "b", "bar": "2"},
]

Caching

The default way of controlling any caching with a pattern-catalog is using a ttl (in seconds), which is an optional value under args which specifies how long should wait after fetching a list of files which match the pattern before it loads them again. The default ttl is 60 seconds. If you want to force it to always get the latest list of available entries, set the ttl to 0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intake-pattern-catalog-2021.12.1.tar.gz (10.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

intake_pattern_catalog-2021.12.1-py2.py3-none-any.whl (10.2 kB view details)

Uploaded Python 2Python 3

File details

Details for the file intake-pattern-catalog-2021.12.1.tar.gz.

File metadata

  • Download URL: intake-pattern-catalog-2021.12.1.tar.gz
  • Upload date:
  • Size: 10.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for intake-pattern-catalog-2021.12.1.tar.gz
Algorithm Hash digest
SHA256 2f342c2d1dac6ae2c0c1161647f8e9aff44622cb4e568dc0d668089d04abe529
MD5 876265cf6ff04807650672fcbc505efa
BLAKE2b-256 0e3bfb5824fdf290bb4e92a5470bf2f547d6aa48f2e4f2e396d2a504fefd2492

See more details on using hashes here.

File details

Details for the file intake_pattern_catalog-2021.12.1-py2.py3-none-any.whl.

File metadata

  • Download URL: intake_pattern_catalog-2021.12.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for intake_pattern_catalog-2021.12.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 c10284fcfc730d463c57ad7025ae2088890bfabc5dc59be06a87e2a36022acaf
MD5 212f58acc9df0478e1618a8e3dc85077
BLAKE2b-256 d8362d9cf5e15e0cb81d22fb6712ada65a31247bc36df5d617a6d561c3b72031

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page