Skip to main content

A wrapper around huggingface datasets, invoking an IPFS model manager.

Project description

Scaling Ethereum Hackathon Presents

About

This is a model manager and wrapper for huggingface, looks up a index of models from an collection of models, and will download a model from either https/s3/ipfs, depending on which source is the fastest.

How to use

pip install .

look run python3 example.py for examples of usage.

this is designed to be a drop in replacement, which requires only 2 lines to be changed

In your python script

import datasets
from datasets import load_dataset
from datasets import FaissIndex
from transformers import AutoModel
from ipfs_transformers import AutoModel
from datasets import load_dataset
from ipfs_datasets import ipfs_load_dataset
from ipfs_datasets import auto_download_dataset
from ipfs_faiss import IpfsFaissDataset

model = AutoModel.from_auto_download("bge-small-en-v1.5")
dataset = auto_download_dataset('Caselaw_Access_Project_JSON')
knnindex = auto_download_faiss_index('Caselaw_Access_Project_FAISS_index')
index = FaissIndex(dimension=512)
embeddings = dataset['embeddings']
query = "What is the capital of France?"
query_vector = model.encode(query)
scores, neighbors = index.search(query_vectors, k=10)

or

import datasets
from datasets import load_dataset
from datasets import FaissIndex
from transformers import AutoModel
from ipfs_transformers import AutoModel
from datasets import load_dataset
from ipfs_datasets import ipfs_load_dataset
from ipfs_datasets import auto_download_dataset
from ipfs_faiss import IpfsFaissDataset

model = AutoModel.from_ipfs("QmccfbkWLYs9K3yucc6b3eSt8s8fKcyRRt24e3CDaeRhM1")
dataset = ipfs_download_dataset('QmccfbkWLYs9K3yucc6b3eSt8s8fKcyRRt24e3CDaeRhM1')
knnindex = ipfs_download_faiss_index('QmccfbkWLYs9K3yucc6b3eSt8s8fKcyRRt24e3CDaeRhM1')
index = FaissIndex(dimension=512)
embeddings = dataset['embeddings']
query = "What is the capital of France?"
query_vector = model.encode(query)
scores, neighbors = index.search(query_vectors, k=10)

or to use with with s3 caching

import datasets
from datasets import load_dataset
from datasets import FaissIndex
from transformers import AutoModel
from ipfs_transformers import AutoModel
from datasets import load_dataset
from ipfs_datasets import ipfs_load_dataset
from ipfs_datasets import auto_download_dataset
from ipfs_faiss import IpfsFaissDataset
s3cfg = {
        "bucket": "cloud",
        "endpoint": "https://storage.googleapis.com",
        "secret_key": "",
        "access_key": ""
    }

model = AutoModel.from_auto_download(
    "bge-small-en-v1.5",
    s3cfg=s3cfg
)
dataset = load_dataset.from_auto_download(
    dataset_name="Caselaw_Access_Project_JSON",
    s3cfg=s3cfg
)
knnindex = ipfs_download_faiss_index.from_auto_download(
    dataset_name="Caselaw_Access_Project_FAISS_index",
    s3cfg=s3cfg
)
index = FaissIndex(dimension=512)
embeddings = dataset['embeddings']
query = "What is the capital of France?"
query_vector = model.encode(query)
scores, neighbors = index.search(query_vectors, k=10)

The following JSON files have been uploaded to web3storage https://huggingface.co/datasets/endomorphosis/Caselaw_Access_Project_JSON or use this pin syncer notice the pins located in pin_store

python3 upload_pins.py

will upload them to web3storage, pinata, filebase, and lighthouse storage and there will instear be a web3storage_pins.tsv file with the new pins to import into datasets ... but only web3stoage is working right now.

IPFS Huggingface Bridge:

for huggingface datasets python library visit: https://github.com/endomorphosis/ipfs_datasets/

for transformers python library visit: https://github.com/endomorphosis/ipfs_transformers/

for transformers js client visit:
https://github.com/endomorphosis/ipfs_transformers_js/

for orbitdbkit nodejs library visit: https://github.com/endomorphosis/orbitdb_kit/

for fireproof_kit nodejs library visit: https://github.com/endomorphosis/fireproof_kit

for python model manager library visit: https://github.com/endomorphosis/ipfs_model_manager/

for nodejs model manager library visit: https://github.com/endomorphosis/ipfs_model_manager_js/

for nodejs ipfs huggingface scraper with pinning services visit: https://github.com/endomorphosis/ipfs_huggingface_scraper/

Author - Benjamin Barber QA - Kevin De Haan

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ipfs_faiss_py-0.0.3.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

ipfs_faiss_py-0.0.3-py3-none-any.whl (15.7 kB view details)

Uploaded Python 3

File details

Details for the file ipfs_faiss_py-0.0.3.tar.gz.

File metadata

  • Download URL: ipfs_faiss_py-0.0.3.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for ipfs_faiss_py-0.0.3.tar.gz
Algorithm Hash digest
SHA256 5d0e4179e8ed6c3bc681b42fdf3111c11eea47d0db6c3f3e8d2b3920ca188225
MD5 ee5c18d9125954aa9dbfd57d465ea27d
BLAKE2b-256 b6073bf8f8736287b38507e1d4a36ff8132cd570790feb0be3d226fe69cd4518

See more details on using hashes here.

File details

Details for the file ipfs_faiss_py-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for ipfs_faiss_py-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f8d7b0c5dbc8d11ea08c0afaf3584a0ac876d1ca19fc72fec54fd197a9ef2ab6
MD5 0669c4fa89c28cadc08e8afe03ca45c8
BLAKE2b-256 36a4b2afa89bc8ebf2283060e2b39085114183d5a3a78bf1dc97ee6b00de5e57

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page