A simple data ingestion library to guide data flows from some places to other places
Project description
Viadot
Documentation: https://dyvenia.github.io/viadot/
Source Code: https://github.com/dyvenia/viadot
A simple data ingestion library to guide data flows from some places to other places.
Getting Data from a Source
Viadot supports several API and RDBMS sources, private and public. Currently, we support the UK Carbon Intensity public API and base the examples on it.
from viadot.sources.uk_carbon_intensity import UKCarbonIntensity
ukci = UKCarbonIntensity()
ukci.query("/intensity")
df = ukci.to_df()
print(df)
Output:
from | to | forecast | actual | index | |
---|---|---|---|---|---|
0 | 2021-08-10T11:00Z | 2021-08-10T11:30Z | 211 | 216 | moderate |
The above df
is a pandas DataFrame
object. It contains data downloaded by viadot
from the Carbon Intensity UK API.
Loading Data to a Source
Depending on the source, viadot
provides different methods of uploading data. For instance, for SQL sources, this would be bulk inserts. For data lake sources, it would be a file upload. For ready-made pipelines including data validation steps using dbt
, see prefect-viadot.
Getting started
Prerequisites
We assume that you have Docker installed.
Installation
Clone the 2.0
branch, and set up and run the environment:
git clone https://github.com/dyvenia/viadot.git -b 2.0 && \
cd viadot/docker && \
sh update.sh && \
sh run.sh && \
cd ../
Configuration
In order to start using sources, you must configure them with required credentials. Credentials can be specified either in the viadot config file (by default, $HOME/.config/viadot/config.yaml
), or passed directly to each source's credentials
parameter.
You can find specific information about each source's credentials in the documentation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.