Skip to main content

A pared-down metadata scraper + SQL runner.

Project description


whale-pipelines is a library based on amundsen's databuilder that enables easy extraction of metadata into whale's markdown format. The library references static config files in ~/.whale/ to establish connections and customize the scraping process. Whale also provides hooks into SQLAlchemy for easy execution of SQL queries against these locally defined connections, without having to specify connection strings at every request.

For information on the full CLI platform, visit whale.

There are two main functions: pull, which handles metadata extraction, and run, which is enables execution of SQL queries.


While whale invokes a function to run pull, it does nothing else than call pull(), with some logging set up around it. If, therefore, you'd like to pare down/write a custom CI/CD pipeline, all you need to do is:

pip install whale-pipelines

then run:

import whale as wh


While libraries like pydobc, sqlalchemy, pyhive, etc. provide easy-to-use interfaces against a warehouse, the stateless nature of these libraries can make it a bit repetitive -- whenever you need to write a query, you generally need to open a cursor, specifying your warehouse URI and credentials. While somewhat trivial, run simply wraps SQLAlchemy, enabling you to open a connection automatically against connections defined in ~/.whale/config/connections.yaml.

To use this, simply run:

import whale as wh

A warehouse_name kwarg can be specified, which will force run to establish a connection with the first warehouse with the corresponding name field matching the argument passed. If not given, the first warehouse in the list will be used.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whale-pipelines-1.5.1.tar.gz (35.0 kB view hashes)

Uploaded source

Built Distribution

whale_pipelines-1.5.1-py2.py3-none-any.whl (48.8 kB view hashes)

Uploaded py2 py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page