Skip to main content

Ipython extension for VDK

Project description

vdk-ipython

monthly download count for vdk-ipython

Ipython extension for VDK

This extension introduces a magic command for Jupyter. The command enables the user to load job_input for his current data job and use it freely while working with Jupyter.

See more about magic commands: https://ipython.readthedocs.io/en/stable/interactive/magics.html

Installation

To use the extension it must be firstly installed with pip as a python package.

pip install vdk-ipython

Usage

Then to load the extension in Jupyter the user should use:

%reload_ext vdk.plugin.ipython

Data Job Python coding cells

And to load the VDK (Job Control object):

%reload_VDK

The %reload_VDK magic can be used with arguments:

Argument Description
--path the path of the data job. Usually you want to leave the default (the directory of Notebook file)
--name the name of the data job. Usually you want to leave the default (the directory name of the Notebook file)
--arguments Arguments (in json format) to be passed to the job
--log-level-vdk The log level of the VDK logs

Data Job SQL Cells

You can also specify %%vdksql cell magic to convert cell into SQL cell which will using Job Input Managed Connection

%vdksql
select * from my_table

The output of cell will be a table. If ipyaggrid is installed then the table would ipyaggrid type of table which allows filtering, search and other cool things

Example

The output of this example is "myjob"

%reload_ext vdk.plugin.ipython

%reload_VDK --name=myjob
response = requests.get("https://jsonplaceholder.typicode.com/todos/1")

job_input.send_object_for_ingestion(
    payload=response.json(), destination_table="placeholder_todo"
)
%%vdksql
select * from placeholder_todo
where completed = True

Ingesting data with %%vdkingest

%%vdkingest

# Data Source Configuration
[sources.yourSourceId]
## Data Source Name. Installed dta sources can be seen using vdk data-sources --list
name = "<data-source-name>"
## The singer tap we will use
config = {
    ## Set the configuration for the data source.
    ## You can see what config options are supported with vdk data-sources --config <data-source-name>
}

[sources.yourSourceId_2]
# repeat this for as many sources you want
# ...

# Data Destination Configuration.
## Ingestion methods and targets are the same one as those accepted by send_object_for_ingestion
## See https://github.com/vmware/versatile-data-kit/blob/main/projects/vdk-core/src/vdk/api/job_input.py#L183
[destinations.yourDestinationId]
## the only required parameter is method
method = "<method-name>"
## Optionally specify target
## target =

[destinations.yourDestinationId_2]
# repeat this for as many destinations you want
# ...

# Data Flows from Source to Destination
[[flows]]
from = "yourSourceId"
to = "yourDestinationId"

[[flows]]
from = "yourSourceId_2"
to = "yourDestinationId_2"

Complete the full self-paced tutorial at https://bit.ly/vdk-ingest

Build and testing

pip install -r requirements.txt
pip install -e .
pytest

In VDK repo ../build-plugin.sh script can be used also.

Note about the CICD:

.plugin-ci.yaml is needed only for plugins part of Versatile Data Kit Plugin repo.

The CI/CD is separated in two stages, a build stage and a release stage. The build stage is made up of a few jobs, all which inherit from the same job configuration and only differ in the Python version they use (3.7, 3.8, 3.9, 3.10 and 3.11). They run according to rules, which are ordered in a way such that changes to a plugin's directory trigger the plugin CI, but changes to a different plugin does not.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vdk-ipython-0.2.1190994517.tar.gz (11.4 kB view details)

Uploaded Source

File details

Details for the file vdk-ipython-0.2.1190994517.tar.gz.

File metadata

  • Download URL: vdk-ipython-0.2.1190994517.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.13

File hashes

Hashes for vdk-ipython-0.2.1190994517.tar.gz
Algorithm Hash digest
SHA256 3928894bf20a1f788726058881a0743563665ff41b9546d05d56ccf2f9f195f3
MD5 e22247023a4e293b73de00b02afd99b7
BLAKE2b-256 aa79ff2a01f43e79f41894b3388bfdbc8404322fc636e672cb49e99689f8b81b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page