Skip to main content

A wrapper around huggingface datasets, invoking an IPFS model manager.

Project description

IPFS Huggingface Datasets

This is a model manager and wrapper for huggingface, looks up a index of models from an collection of models, and will download a model from either https/s3/ipfs, depending on which source is the fastest.

How to use

pip install .

look run python3 example.py for examples of usage.

this is designed to be a drop in replacement, which requires only 2 lines to be changed

In your python script

from datasets import load_dataset
from ipfs_datasets import load_dataset
dataset = load_dataset.from_auto_download("bge-small-en-v1.5")  

or

from datasets import load_dataset
from ipfs_datasets import load_dataset
dataset = load_dataset.from_ipfs("QmccfbkWLYs9K3yucc6b3eSt8s8fKcyRRt24e3CDaeRhM1")

or to use with with s3 caching

from datasets import load_dataset
from ipfs_datasets import load_dataset
dataset = load_dataset.from_auto_download(
    dataset_name="common-crawl",
    s3cfg={
        "bucket": "cloud",
        "endpoint": "https://storage.googleapis.com",
        "secret_key": "",
        "access_key": ""
    }
)

IPFS Huggingface Bridge:

for transformers python library visit: https://github.com/endomorphosis/ipfs_transformers/

for transformers js client visit:
https://github.com/endomorphosis/ipfs_transformers_js/

for orbitdb_kit nodejs library visit: https://github.com/endomorphosis/orbitdb_kit/

for fireproof_kit nodejs library visit: https://github.com/endomorphosis/fireproof_kit

for Faiss KNN index python library visit: https://github.com/endomorphosis/ipfs_faiss/

for python model manager library visit: https://github.com/endomorphosis/ipfs_model_manager/

for nodejs model manager library visit: https://github.com/endomorphosis/ipfs_model_manager_js/

for nodejs ipfs huggingface scraper with pinning services visit: https://github.com/endomorphosis/ipfs_huggingface_scraper/

Author - Benjamin Barber QA - Kevin De Haan

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ipfs_datasets_py-0.0.8.tar.gz (25.6 kB view details)

Uploaded Source

Built Distribution

ipfs_datasets_py-0.0.8-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file ipfs_datasets_py-0.0.8.tar.gz.

File metadata

  • Download URL: ipfs_datasets_py-0.0.8.tar.gz
  • Upload date:
  • Size: 25.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for ipfs_datasets_py-0.0.8.tar.gz
Algorithm Hash digest
SHA256 9718735e9107bfeed331f46748f01bed55d9b420bd9fe71d8a199880b0ea21bd
MD5 4a63576f816b1d988a0275822a545ed3
BLAKE2b-256 514b8190115ae08e8dc03c05cc9560e691ca7aa272205d31be24c4da5ad5f16f

See more details on using hashes here.

File details

Details for the file ipfs_datasets_py-0.0.8-py3-none-any.whl.

File metadata

File hashes

Hashes for ipfs_datasets_py-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 5260ee3bd4da0e4e99a0a2c5fcdd252f6680ca4f5949450f7ec37181c8a936c6
MD5 ab5e7d2719e601f727d869719ab4db5a
BLAKE2b-256 6a963c170c00cc41704a6bf84618c8d515fb15e069de80b321f064cf3e782cd5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page