Skip to main content

Integrate VDK with Huggingface as both data source and target

Project description

Huggingface

monthly download count for vdk-huggingface

Versatile Data Kit (VDK) plugin for integrating with Huggingface as both a data source and a target. This plugin allows you to ingest data payloads into a Huggingface repository and makes it easier to work with datasets stored in Huggingface.

Usage

pip install vdk-huggingface

The functionality adds new ingestion method "huggingface" which can be used like that:

job_input.send_object_for_ingestion(data, method="huggingface")

Configuration

(vdk config-help is useful command to browse all config options of your installation of vdk)

Name Description (example) Value
HUGGINGFACE_TOKEN HuggingFace API token for authentication. Get one from HuggingFace Settings ""
HUGGINGFACE_REPO_ID HuggingFace Dataset repository ID "username/test-dataset"

Build and testing

pip install -r requirements.txt
pip install -e .
pytest

In VDK repo ../build-plugin.sh script can be used also.

Note about the CICD:

.plugin-ci.yaml is needed only for plugins part of Versatile Data Kit Plugin repo.

The CI/CD is separated in two stages, a build stage and a release stage. The build stage is made up of a few jobs, all which inherit from the same job configuration and only differ in the Python version they use (3.7, 3.8, 3.9 and 3.10). They run according to rules, which are ordered in a way such that changes to a plugin's directory trigger the plugin CI, but changes to a different plugin does not.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vdk-huggingface-0.1.1190994517.tar.gz (4.8 kB view details)

Uploaded Source

File details

Details for the file vdk-huggingface-0.1.1190994517.tar.gz.

File metadata

File hashes

Hashes for vdk-huggingface-0.1.1190994517.tar.gz
Algorithm Hash digest
SHA256 9aab3ea8ac43f9d93c7d94c5684c2cd1097bac549fa5bdd53ff8383274c57121
MD5 8b2fe83984965bb0e4b701e1628fa0c7
BLAKE2b-256 4e84ae3a0ea209370e85d90a74b286220384753d4153b05fe751ff7c9cee12fe

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page