Skip to main content

A simple data ingestion library to guide data flows from some places to other places

Project description

Viadot

build status formatting codecov

Documentation: https://dyvenia.github.io/viadot/

Source Code: https://github.com/dyvenia/viadot


A simple data ingestion library to guide data flows from some places to other places.

Getting Data from a Source

Viadot supports several API and RDBMS sources, private and public. Currently, we support the UK Carbon Intensity public API and base the examples on it.

from viadot.sources.uk_carbon_intensity import UKCarbonIntensity
ukci = UKCarbonIntensity()
ukci.query("/intensity")
df = ukci.to_df()
df

Output:

from to forecast actual index
0 2021-08-10T11:00Z 2021-08-10T11:30Z 211 216 moderate

The above df is a python pandas DataFrame object. The above df contains data downloaded from viadot from the Carbon Intensity UK API.

Loading Data to a Source

Depending on the source, viadot provides different methods of uploading data. For instance, for SQL sources, this would be bulk inserts. For data lake sources, it would be a file upload. We also provide ready-made pipelines including data validation steps using Great Expectations.

An example of loading data into SQLite from a pandas DataFrame using the SQLiteInsert Prefect task:

from viadot.tasks import SQLiteInsert

insert_task = SQLiteInsert()
insert_task.run(table_name=TABLE_NAME, dtypes=dtypes, db_path=database_path, df=df, if_exists="replace")

Running tests

To run tests, log into the container and run pytest:

cd viadot/docker
run.sh
docker exec -it viadot_testing bash
pytest

Running flows locally

You can run the example flows from the terminal:

run.sh
docker exec -it viadot_testing bash
FLOW_NAME=hello_world; python -m viadot.examples.$FLOW_NAME

However, when developing, the easiest way is to use the provided Jupyter Lab container available at http://localhost:9000/.

How to contribute

  1. Clone the release branch
  2. Pull the docker env by running viadot/docker/update.sh -t dev
  3. Run the env with viadot/docker/run.sh
  4. Log into the dev container and install in development mode so that viadot will auto-install at each code change:
docker exec -it viadot_testing bash
pip install -e .
  1. Edit and test your changes with pytest
  2. Submit a PR. The PR should contain the following:
  • new/changed functionality
  • tests for the changes
  • changes added to CHANGELOG.md
  • any other relevant resources updated (esp. viadot/docs)

Please follow the standards and best practices used within the library (eg. when adding tasks, see how other tasks are constructed, etc.). For any questions, please reach out to us here on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

viadot-0.2.4.tar.gz (43.9 kB view details)

Uploaded Source

Built Distribution

viadot-0.2.4-py3-none-any.whl (68.0 kB view details)

Uploaded Python 3

File details

Details for the file viadot-0.2.4.tar.gz.

File metadata

  • Download URL: viadot-0.2.4.tar.gz
  • Upload date:
  • Size: 43.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.11

File hashes

Hashes for viadot-0.2.4.tar.gz
Algorithm Hash digest
SHA256 6817e5d77c37cc5b833b2f160ef910926c701c5d02c648c80e625b6f26e4f0b9
MD5 dd8d9f55d1b609ce89d2a2a29e1b9b9a
BLAKE2b-256 efd016e0f771b5cd6210c88b0f17245ee6923e3f0a30a5fa8088e25761df7a62

See more details on using hashes here.

File details

Details for the file viadot-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: viadot-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 68.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.11

File hashes

Hashes for viadot-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a0dad27999aeba611da32fec4738653e395db3ef66d70fb4e2e990ba584714eb
MD5 1f5bd8fe95d553338f612bb25e7608bd
BLAKE2b-256 e29f1ce1967ddf0994f92d1e83a52aac5058c84e9323b71dd977572e3d199ad6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page