Skip to main content

A wrapper around huggingface datasets, invoking an IPFS model manager.

Project description

Scaling Ethereum Hackathon Presents

About

This is a model manager and wrapper for huggingface, looks up a index of models from an collection of models, and will download a model from either https/s3/ipfs, depending on which source is the fastest.

How to use

pip install .

look run python3 example.py for examples of usage.

this is designed to be a drop in replacement, which requires only 2 lines to be changed

In your python script

import datasets
from datasets import load_dataset
from datasets import FaissIndex
from transformers import AutoModel
from ipfs_transformers import AutoModel
from datasets import load_dataset
from ipfs_datasets import ipfs_load_dataset
from ipfs_datasets import auto_download_dataset
from ipfs_faiss import IpfsFaissDataset

model = AutoModel.from_auto_download("bge-small-en-v1.5")
dataset = auto_download_dataset('Caselaw_Access_Project_JSON')
knnindex = auto_download_faiss_index('Caselaw_Access_Project_FAISS_index')
index = FaissIndex(dimension=512)
embeddings = dataset['embeddings']
query = "What is the capital of France?"
query_vector = model.encode(query)
scores, neighbors = index.search(query_vectors, k=10)

or

import datasets
from datasets import load_dataset
from datasets import FaissIndex
from transformers import AutoModel
from ipfs_transformers import AutoModel
from datasets import load_dataset
from ipfs_datasets import ipfs_load_dataset
from ipfs_datasets import auto_download_dataset
from ipfs_faiss import IpfsFaissDataset

model = AutoModel.from_ipfs("QmccfbkWLYs9K3yucc6b3eSt8s8fKcyRRt24e3CDaeRhM1")
dataset = ipfs_download_dataset('QmccfbkWLYs9K3yucc6b3eSt8s8fKcyRRt24e3CDaeRhM1')
knnindex = ipfs_download_faiss_index('QmccfbkWLYs9K3yucc6b3eSt8s8fKcyRRt24e3CDaeRhM1')
index = FaissIndex(dimension=512)
embeddings = dataset['embeddings']
query = "What is the capital of France?"
query_vector = model.encode(query)
scores, neighbors = index.search(query_vectors, k=10)

or to use with with s3 caching

import datasets
from datasets import load_dataset
from datasets import FaissIndex
from transformers import AutoModel
from ipfs_transformers import AutoModel
from datasets import load_dataset
from ipfs_datasets import ipfs_load_dataset
from ipfs_datasets import auto_download_dataset
from ipfs_faiss import IpfsFaissDataset
s3cfg = {
        "bucket": "cloud",
        "endpoint": "https://storage.googleapis.com",
        "secret_key": "",
        "access_key": ""
    }

model = AutoModel.from_auto_download(
    "bge-small-en-v1.5",
    s3cfg=s3cfg
)
dataset = load_dataset.from_auto_download(
    dataset_name="Caselaw_Access_Project_JSON",
    s3cfg=s3cfg
)
knnindex = ipfs_download_faiss_index.from_auto_download(
    dataset_name="Caselaw_Access_Project_FAISS_index",
    s3cfg=s3cfg
)
index = FaissIndex(dimension=512)
embeddings = dataset['embeddings']
query = "What is the capital of France?"
query_vector = model.encode(query)
scores, neighbors = index.search(query_vectors, k=10)

The following JSON files have been uploaded to web3storage https://huggingface.co/datasets/endomorphosis/Caselaw_Access_Project_JSON or use this pin syncer notice the pins located in pin_store

python3 upload_pins.py

will upload them to web3storage, pinata, filebase, and lighthouse storage and there will instear be a web3storage_pins.tsv file with the new pins to import into datasets ... but only web3stoage is working right now.

IPFS Huggingface Bridge:

for huggingface datasets python library visit: https://github.com/endomorphosis/ipfs_datasets/

for transformers python library visit: https://github.com/endomorphosis/ipfs_transformers/

for transformers js client visit:
https://github.com/endomorphosis/ipfs_transformers_js/

for orbitdbkit nodejs library visit: https://github.com/endomorphosis/orbitdb_kit/

for fireproof_kit nodejs library visit: https://github.com/endomorphosis/fireproof_kit

for python model manager library visit: https://github.com/endomorphosis/ipfs_model_manager/

for nodejs model manager library visit: https://github.com/endomorphosis/ipfs_model_manager_js/

for nodejs ipfs huggingface scraper with pinning services visit: https://github.com/endomorphosis/ipfs_huggingface_scraper/

Author - Benjamin Barber QA - Kevin De Haan

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ipfs_faiss_py-0.0.6.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ipfs_faiss_py-0.0.6-py3-none-any.whl (15.7 kB view details)

Uploaded Python 3

File details

Details for the file ipfs_faiss_py-0.0.6.tar.gz.

File metadata

  • Download URL: ipfs_faiss_py-0.0.6.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for ipfs_faiss_py-0.0.6.tar.gz
Algorithm Hash digest
SHA256 5e171900539cfbb46608ed1de727a7657b328ad3cf30c6e14e66863bd3757c65
MD5 8769c2841e41fb788c0a69dd673b53ff
BLAKE2b-256 d555ee212385de5deae6a96c0f55431678427a979ad6770795f2f851166ceb2f

See more details on using hashes here.

File details

Details for the file ipfs_faiss_py-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: ipfs_faiss_py-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 15.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for ipfs_faiss_py-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 dda4331de7f8fb96767f1facfb30e584c8641a26b7fe1deaedc7f4025052f816
MD5 d1b4c1468a3c2f984ff21d45a807a639
BLAKE2b-256 3ae3242ab4c8de65c8eb7cc63cc0b253633050153e4905200890b4a5755e2669

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page