Skip to main content

A pared-down metadata scraper + SQL runner.

Project description

whale-pipelines

whale-pipelines is a library based on amundsen's databuilder that enables easy extraction of metadata into whale's markdown format. The library references static config files in ~/.whale/ to establish connections and customize the scraping process. Whale also provides hooks into SQLAlchemy for easy execution of SQL queries against these locally defined connections, without having to specify connection strings at every request.

For information on the full CLI platform, visit whale.

There are two main functions: pull, which handles metadata extraction, and run, which is enables execution of SQL queries.

pull

While whale invokes a build_script.py function to run pull, it does nothing else than call pull(), with some logging set up around it. If, therefore, you'd like to pare down/write a custom CI/CD pipeline, all you need to do is:

pip install whale-pipelines

then run:

import whale as wh
wh.pull()

run

While libraries like pydobc, sqlalchemy, pyhive, etc. provide easy-to-use interfaces against a warehouse, the stateless nature of these libraries can make it a bit repetitive -- whenever you need to write a query, you generally need to open a cursor, specifying your warehouse URI and credentials. While somewhat trivial, run simply wraps SQLAlchemy, enabling you to open a connection automatically against connections defined in ~/.whale/config/connections.yaml.

To use this, simply run:

import whale as wh
wh.run()

A warehouse_name kwarg can be specified, which will force run to establish a connection with the first warehouse with the corresponding name field matching the argument passed. If not given, the first warehouse in the list will be used.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whale-pipelines-1.5.1.tar.gz (35.0 kB view details)

Uploaded Source

Built Distribution

whale_pipelines-1.5.1-py2.py3-none-any.whl (48.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file whale-pipelines-1.5.1.tar.gz.

File metadata

  • Download URL: whale-pipelines-1.5.1.tar.gz
  • Upload date:
  • Size: 35.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.7

File hashes

Hashes for whale-pipelines-1.5.1.tar.gz
Algorithm Hash digest
SHA256 2f73365871b29548e72b7bf4cc15179506a3b0585603a6fc448df3617f792993
MD5 eb9fa52f127a59176aebac36ec278d31
BLAKE2b-256 1662503ad9d8bcba0536ed018aea8125806ba6be711538fb89eda354c40f4d02

See more details on using hashes here.

File details

Details for the file whale_pipelines-1.5.1-py2.py3-none-any.whl.

File metadata

  • Download URL: whale_pipelines-1.5.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 48.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.1.2 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.7

File hashes

Hashes for whale_pipelines-1.5.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 67214c3de2e2e88824dc9105c231f0f813875f05b9663a5f4207a3bdf70b9f13
MD5 bb7f86314cfc1f3088ad93d8c1ea4784
BLAKE2b-256 60dd86f377f5ae138d6657dd1d8868f4a30168ee71aecf7e6248b35ea1a163f7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page