Skip to main content

A simple data ingestion library to guide data flows from some places to other places

Project description

Viadot

build status formatting codecov

Documentation: https://dyvenia.github.io/viadot/

Source Code: https://github.com/dyvenia/viadot


A simple data ingestion library to guide data flows from some places to other places.

Getting Data from a Source

Viadot supports several API and RDBMS sources, private and public. Currently, we support the UK Carbon Intensity public API and base the examples on it.

from viadot.sources.uk_carbon_intensity import UKCarbonIntensity
ukci = UKCarbonIntensity()
ukci.query("/intensity")
df = ukci.to_df()
df

Output:

from to forecast actual index
0 2021-08-10T11:00Z 2021-08-10T11:30Z 211 216 moderate

The above df is a python pandas DataFrame object. The above df contains data downloaded from viadot from the Carbon Intensity UK API.

Loading Data to a Source

Depending on the source, viadot provides different methods of uploading data. For instance, for SQL sources, this would be bulk inserts. For data lake sources, it would be a file upload. We also provide ready-made pipelines including data validation steps using Great Expectations.

An example of loading data into SQLite from a pandas DataFrame using the SQLiteInsert Prefect task:

from viadot.tasks import SQLiteInsert

insert_task = SQLiteInsert()
insert_task.run(table_name=TABLE_NAME, dtypes=dtypes, db_path=database_path, df=df, if_exists="replace")

Set up

Note: If you're running on Unix, after cloning the repo, you may need to grant executable privileges to the update.sh and run.sh scripts:

sudo chmod +x viadot/docker/update.sh && \
sudo chmod +x viadot/docker/run.sh

a) user

Clone the main branch, enter the docker folder, and set up the environment:

git clone https://github.com/dyvenia/viadot.git && \
cd viadot/docker && \
./update.sh

Run the enviroment:

./run.sh

b) developer

Clone the dev branch, enter the docker folder, and set up the environment:

git clone -b dev https://github.com/dyvenia/viadot.git && \
cd viadot/docker && \
./update.sh -t dev

Run the enviroment:

./run.sh -t dev

Install the library in development mode (repeat for the viadot_jupyter_lab container if needed):

docker exec -it viadot_testing pip install -e . --user

Running tests

To run tests, log into the container and run pytest:

docker exec -it viadot_testing bash
pytest

Running flows locally

You can run the example flows from the terminal:

docker exec -it viadot_testing bash
FLOW_NAME=hello_world; python -m viadot.examples.$FLOW_NAME

However, when developing, the easiest way is to use the provided Jupyter Lab container available in the browser at http://localhost:9000/.

How to contribute

  1. Fork repository if you do not have write access
  2. Set up locally
  3. Test your changes with pytest
  4. Submit a PR. The PR should contain the following:
  • new/changed functionality
  • tests for the changes
  • changes added to CHANGELOG.md
  • any other relevant resources updated (esp. viadot/docs)

Please follow the standards and best practices used within the library (eg. when adding tasks, see how other tasks are constructed, etc.). For any questions, please reach out to us here on GitHub.

Style guidelines

  • the code should be formatted with Black using default settings (easiest way is to use the VSCode extension)
  • commit messages should:
    • begin with an emoji
    • start with one of the following verbs, capitalized, immediately after the summary emoji: "Added", "Updated", "Removed", "Fixed", "Renamed", and, sporadically, other ones, such as "Upgraded", "Downgraded", or whatever you find relevant for your particular situation
    • contain a useful description of what the commit is doing

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

viadot-0.2.10.tar.gz (49.2 kB view details)

Uploaded Source

Built Distribution

viadot-0.2.10-py3-none-any.whl (75.1 kB view details)

Uploaded Python 3

File details

Details for the file viadot-0.2.10.tar.gz.

File metadata

  • Download URL: viadot-0.2.10.tar.gz
  • Upload date:
  • Size: 49.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for viadot-0.2.10.tar.gz
Algorithm Hash digest
SHA256 c059fd929fb4d544141643bd6cc1bbce34bddf60f4b303746957ec39582176ee
MD5 93f2dda8c8499c07e7d52faa6b4beefd
BLAKE2b-256 b98715ee5a177c65adcd8204aeca760d5f3eddb4983a586871b2e339dd009ec3

See more details on using hashes here.

File details

Details for the file viadot-0.2.10-py3-none-any.whl.

File metadata

  • Download URL: viadot-0.2.10-py3-none-any.whl
  • Upload date:
  • Size: 75.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for viadot-0.2.10-py3-none-any.whl
Algorithm Hash digest
SHA256 9af419d121b1b375c1496b380f39279caf3c67f6e6fb1c14d6e0e16c798644e2
MD5 5c762f2c33f5ef53651481d6795e5b00
BLAKE2b-256 fa1e196910c01470373b85a451e04af223d747fea0dd5e255d18836866a34fef

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page