Skip to main content

A wrapper around huggingface datasets, invoking an IPFS model manager.

Project description

Scaling Ethereum Hackathon Presents

About

This is a model manager and wrapper for huggingface, looks up a index of models from an collection of models, and will download a model from either https/s3/ipfs, depending on which source is the fastest.

How to use

pip install .

look run python3 example.py for examples of usage.

this is designed to be a drop in replacement, which requires only 2 lines to be changed

In your python script

import datasets
from datasets import load_dataset
from datasets import FaissIndex
from transformers import AutoModel
from ipfs_transformers import AutoModel
from datasets import load_dataset
from ipfs_datasets import ipfs_load_dataset
from ipfs_datasets import auto_download_dataset
from ipfs_faiss import IpfsFaissDataset

model = AutoModel.from_auto_download("bge-small-en-v1.5")
dataset = auto_download_dataset('Caselaw_Access_Project_JSON')
knnindex = auto_download_faiss_index('Caselaw_Access_Project_FAISS_index')
index = FaissIndex(dimension=512)
embeddings = dataset['embeddings']
query = "What is the capital of France?"
query_vector = model.encode(query)
scores, neighbors = index.search(query_vectors, k=10)

or

import datasets
from datasets import load_dataset
from datasets import FaissIndex
from transformers import AutoModel
from ipfs_transformers import AutoModel
from datasets import load_dataset
from ipfs_datasets import ipfs_load_dataset
from ipfs_datasets import auto_download_dataset
from ipfs_faiss import IpfsFaissDataset

model = AutoModel.from_ipfs("QmccfbkWLYs9K3yucc6b3eSt8s8fKcyRRt24e3CDaeRhM1")
dataset = ipfs_download_dataset('QmccfbkWLYs9K3yucc6b3eSt8s8fKcyRRt24e3CDaeRhM1')
knnindex = ipfs_download_faiss_index('QmccfbkWLYs9K3yucc6b3eSt8s8fKcyRRt24e3CDaeRhM1')
index = FaissIndex(dimension=512)
embeddings = dataset['embeddings']
query = "What is the capital of France?"
query_vector = model.encode(query)
scores, neighbors = index.search(query_vectors, k=10)

or to use with with s3 caching

import datasets
from datasets import load_dataset
from datasets import FaissIndex
from transformers import AutoModel
from ipfs_transformers import AutoModel
from datasets import load_dataset
from ipfs_datasets import ipfs_load_dataset
from ipfs_datasets import auto_download_dataset
from ipfs_faiss import IpfsFaissDataset
s3cfg = {
        "bucket": "cloud",
        "endpoint": "https://storage.googleapis.com",
        "secret_key": "",
        "access_key": ""
    }

model = AutoModel.from_auto_download(
    "bge-small-en-v1.5",
    s3cfg=s3cfg
)
dataset = load_dataset.from_auto_download(
    dataset_name="Caselaw_Access_Project_JSON",
    s3cfg=s3cfg
)
knnindex = ipfs_download_faiss_index.from_auto_download(
    dataset_name="Caselaw_Access_Project_FAISS_index",
    s3cfg=s3cfg
)
index = FaissIndex(dimension=512)
embeddings = dataset['embeddings']
query = "What is the capital of France?"
query_vector = model.encode(query)
scores, neighbors = index.search(query_vectors, k=10)

The following JSON files have been uploaded to web3storage https://huggingface.co/datasets/endomorphosis/Caselaw_Access_Project_JSON or use this pin syncer notice the pins located in pin_store

python3 upload_pins.py

will upload them to web3storage, pinata, filebase, and lighthouse storage and there will instear be a web3storage_pins.tsv file with the new pins to import into datasets ... but only web3stoage is working right now.

IPFS Huggingface Bridge:

for huggingface datasets python library visit: https://github.com/endomorphosis/ipfs_datasets/

for transformers python library visit: https://github.com/endomorphosis/ipfs_transformers/

for transformers js client visit:
https://github.com/endomorphosis/ipfs_transformers_js/

for orbitdbkit nodejs library visit: https://github.com/endomorphosis/orbitdb_kit/

for fireproof_kit nodejs library visit: https://github.com/endomorphosis/fireproof_kit

for python model manager library visit: https://github.com/endomorphosis/ipfs_model_manager/

for nodejs model manager library visit: https://github.com/endomorphosis/ipfs_model_manager_js/

for nodejs ipfs huggingface scraper with pinning services visit: https://github.com/endomorphosis/ipfs_huggingface_scraper/

Author - Benjamin Barber QA - Kevin De Haan

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ipfs_faiss_py-0.0.4.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ipfs_faiss_py-0.0.4-py3-none-any.whl (15.7 kB view details)

Uploaded Python 3

File details

Details for the file ipfs_faiss_py-0.0.4.tar.gz.

File metadata

  • Download URL: ipfs_faiss_py-0.0.4.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for ipfs_faiss_py-0.0.4.tar.gz
Algorithm Hash digest
SHA256 deffc973e9cf23f06dca53f9fc873f006b3ebdc1b8219e127b7beea7ac8bf6cd
MD5 923b13afb3ce6e8de128f9be0ece96bf
BLAKE2b-256 67674598cb89b944f02b361ded5091a90c41b52dc89339cd95ef3bc77589f0ca

See more details on using hashes here.

File details

Details for the file ipfs_faiss_py-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: ipfs_faiss_py-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 15.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for ipfs_faiss_py-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ec4208c2b8432b5f7b1bc24246fd22ad1deff5b166153ff1be7b1f5ed84de3f6
MD5 4739659fdae71dbe8c0256f3cecd9543
BLAKE2b-256 8de124e21b357305fcf6028dd595e71feeff3d506a9f8917e04de0ac6a49e971

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page