Skip to main content

A wrapper around huggingface datasets, invoking an IPFS model manager.

Project description

IPFS Huggingface Datasets

This is a model manager and wrapper for huggingface, looks up a index of models from an collection of models, and will download a model from either https/s3/ipfs, depending on which source is the fastest.

How to use

pip install .

look run python3 example.py for examples of usage.

this is designed to be a drop in replacement, which requires only 2 lines to be changed

In your python script

from datasets import load_dataset
from ipfs_datasets import load_dataset
dataset = load_dataset.from_auto_download("bge-small-en-v1.5")  

or

from datasets import load_dataset
from ipfs_datasets import load_dataset
dataset = load_dataset.from_ipfs("QmccfbkWLYs9K3yucc6b3eSt8s8fKcyRRt24e3CDaeRhM1")

or to use with with s3 caching

from datasets import load_dataset
from ipfs_datasets import load_dataset
dataset = load_dataset.from_auto_download(
    dataset_name="common-crawl",
    s3cfg={
        "bucket": "cloud",
        "endpoint": "https://storage.googleapis.com",
        "secret_key": "",
        "access_key": ""
    }
)

IPFS Huggingface Bridge:

for transformers python library visit: https://github.com/endomorphosis/ipfs_transformers/

for transformers js client visit:
https://github.com/endomorphosis/ipfs_transformers_js/

for orbitdb_kit nodejs library visit: https://github.com/endomorphosis/orbitdb_kit/

for fireproof_kit nodejs library visit: https://github.com/endomorphosis/fireproof_kit

for Faiss KNN index python library visit: https://github.com/endomorphosis/ipfs_faiss/

for python model manager library visit: https://github.com/endomorphosis/ipfs_model_manager/

for nodejs model manager library visit: https://github.com/endomorphosis/ipfs_model_manager_js/

for nodejs ipfs huggingface scraper with pinning services visit: https://github.com/endomorphosis/ipfs_huggingface_scraper/

Author - Benjamin Barber QA - Kevin De Haan

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ipfs_datasets_py-0.0.10.tar.gz (26.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ipfs_datasets_py-0.0.10-py3-none-any.whl (26.7 kB view details)

Uploaded Python 3

File details

Details for the file ipfs_datasets_py-0.0.10.tar.gz.

File metadata

  • Download URL: ipfs_datasets_py-0.0.10.tar.gz
  • Upload date:
  • Size: 26.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for ipfs_datasets_py-0.0.10.tar.gz
Algorithm Hash digest
SHA256 abfe3060107e8703e879129bbef070664348b26f9294ed914736b767146e4087
MD5 d72aab387c4191c99b71dbfdd0a69134
BLAKE2b-256 9cee393f268e36dcefb4491e959178d6762988b58312c60e177641c208b91b27

See more details on using hashes here.

File details

Details for the file ipfs_datasets_py-0.0.10-py3-none-any.whl.

File metadata

File hashes

Hashes for ipfs_datasets_py-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 2d8d3bc02df5dd50d5e3831d7add3d78c8d0158727feda36a571ca584b92bbfa
MD5 8aebd01419abebaa56be1c898a9b9671
BLAKE2b-256 46882950d0a91a8076d8cdd2fd0577cffd10160cf47c56d2731d8dee8ee52f3f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page