A simple data ingestion library to guide data flows from some places to other places
Project description
Viadot
Documentation: https://dyvenia.github.io/viadot/
Source Code: https://github.com/dyvenia/viadot
A simple data ingestion library to guide data flows from some places to other places.
Getting Data from a Source
Viadot supports several API and RDBMS sources, private and public. Currently, we support the UK Carbon Intensity public API and base the examples on it.
from viadot.sources.uk_carbon_intensity import UKCarbonIntensity
ukci = UKCarbonIntensity()
ukci.query("/intensity")
df = ukci.to_df()
df
Output:
from | to | forecast | actual | index | |
---|---|---|---|---|---|
0 | 2021-08-10T11:00Z | 2021-08-10T11:30Z | 211 | 216 | moderate |
The above df
is a python pandas DataFrame
object. The above df contains data downloaded from viadot from the Carbon Intensity UK API.
Loading Data to a Source
Depending on the source, viadot provides different methods of uploading data. For instance, for SQL sources, this would be bulk inserts. For data lake sources, it would be a file upload. We also provide ready-made pipelines including data validation steps using Great Expectations.
An example of loading data into SQLite from a pandas DataFrame
using the SQLiteInsert
Prefect task:
from viadot.tasks import SQLiteInsert
insert_task = SQLiteInsert()
insert_task.run(table_name=TABLE_NAME, dtypes=dtypes, db_path=database_path, df=df, if_exists="replace")
Running tests
To run tests, log into the container and run pytest:
cd viadot/docker
run.sh
docker exec -it viadot_testing bash
pytest
Running flows locally
You can run the example flows from the terminal:
run.sh
docker exec -it viadot_testing bash
FLOW_NAME=hello_world; python -m viadot.examples.$FLOW_NAME
However, when developing, the easiest way is to use the provided Jupyter Lab container available at http://localhost:9000/
.
How to contribute
- Clone the release branch
- Pull the docker env by running
viadot/docker/update.sh -t dev
- Run the env with
viadot/docker/run.sh
- Log into the dev container and install in development mode so that viadot will auto-install at each code change:
docker exec -it viadot_testing bash
pip install -e .
- Edit and test your changes with
pytest
- Submit a PR. The PR should contain the following:
- new/changed functionality
- tests for the changes
- changes added to
CHANGELOG.md
- any other relevant resources updated (esp.
viadot/docs
)
Please follow the standards and best practices used within the library (eg. when adding tasks, see how other tasks are constructed, etc.). For any questions, please reach out to us here on GitHub.
Style guidelines
- the code should be formatted with Black using default settings (easiest way is to use the VSCode extension)
- commit messages should:
- begin with an emoji
- start with one of the following verbs, capitalized, immediately after the summary emoji: "Added", "Updated", "Removed", "Fixed", "Renamed", and, sporadically, other ones, such as "Upgraded", "Downgraded", or whatever you find relevant for your particular situation
- contain a useful description of what the commit is doing
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file viadot-0.2.7.tar.gz
.
File metadata
- Download URL: viadot-0.2.7.tar.gz
- Upload date:
- Size: 48.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 62914da030bb7428579b59803298094f1494eae2b19c4ff633c2e148cf06ecfa |
|
MD5 | b4deb7e761adba7c65e2f01a6b8c2627 |
|
BLAKE2b-256 | 72dc9fee889fd0c0e3ad66c06851ffb1d9dcfa6bc68da70a73a5216b7a1ea288 |
File details
Details for the file viadot-0.2.7-py3-none-any.whl
.
File metadata
- Download URL: viadot-0.2.7-py3-none-any.whl
- Upload date:
- Size: 74.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 76b58f50f8df83687d6cf75bebde47e5285c32002b80fffd88a4aa9fb137bf37 |
|
MD5 | 13e2d1e5f462d888ac1edb6c3bd35c66 |
|
BLAKE2b-256 | 32b6f7bb85af82fb7a44fcd4c26407b4401229b9377b31c09d1c64c0c91e5a2e |