A wrapper around huggingface datasets, invoking an IPFS model manager.
Project description
Scaling Ethereum Hackathon Presents
About
This is a model manager and wrapper for huggingface, looks up a index of models from an collection of models, and will download a model from either https/s3/ipfs, depending on which source is the fastest.
How to use
pip install .
look run python3 example.py
for examples of usage.
this is designed to be a drop in replacement, which requires only 2 lines to be changed
In your python script
import datasets
from datasets import load_dataset
from datasets import FaissIndex
from transformers import AutoModel
from ipfs_transformers import AutoModel
from datasets import load_dataset
from ipfs_datasets import ipfs_load_dataset
from ipfs_datasets import auto_download_dataset
from ipfs_faiss import IpfsFaissDataset
model = AutoModel.from_auto_download("bge-small-en-v1.5")
dataset = auto_download_dataset('Caselaw_Access_Project_JSON')
knnindex = auto_download_faiss_index('Caselaw_Access_Project_FAISS_index')
index = FaissIndex(dimension=512)
embeddings = dataset['embeddings']
query = "What is the capital of France?"
query_vector = model.encode(query)
scores, neighbors = index.search(query_vectors, k=10)
or
import datasets
from datasets import load_dataset
from datasets import FaissIndex
from transformers import AutoModel
from ipfs_transformers import AutoModel
from datasets import load_dataset
from ipfs_datasets import ipfs_load_dataset
from ipfs_datasets import auto_download_dataset
from ipfs_faiss import IpfsFaissDataset
model = AutoModel.from_ipfs("QmccfbkWLYs9K3yucc6b3eSt8s8fKcyRRt24e3CDaeRhM1")
dataset = ipfs_download_dataset('QmccfbkWLYs9K3yucc6b3eSt8s8fKcyRRt24e3CDaeRhM1')
knnindex = ipfs_download_faiss_index('QmccfbkWLYs9K3yucc6b3eSt8s8fKcyRRt24e3CDaeRhM1')
index = FaissIndex(dimension=512)
embeddings = dataset['embeddings']
query = "What is the capital of France?"
query_vector = model.encode(query)
scores, neighbors = index.search(query_vectors, k=10)
or to use with with s3 caching
import datasets
from datasets import load_dataset
from datasets import FaissIndex
from transformers import AutoModel
from ipfs_transformers import AutoModel
from datasets import load_dataset
from ipfs_datasets import ipfs_load_dataset
from ipfs_datasets import auto_download_dataset
from ipfs_faiss import IpfsFaissDataset
s3cfg = {
"bucket": "cloud",
"endpoint": "https://storage.googleapis.com",
"secret_key": "",
"access_key": ""
}
model = AutoModel.from_auto_download(
"bge-small-en-v1.5",
s3cfg=s3cfg
)
dataset = load_dataset.from_auto_download(
dataset_name="Caselaw_Access_Project_JSON",
s3cfg=s3cfg
)
knnindex = ipfs_download_faiss_index.from_auto_download(
dataset_name="Caselaw_Access_Project_FAISS_index",
s3cfg=s3cfg
)
index = FaissIndex(dimension=512)
embeddings = dataset['embeddings']
query = "What is the capital of France?"
query_vector = model.encode(query)
scores, neighbors = index.search(query_vectors, k=10)
The following JSON files have been uploaded to web3storage https://huggingface.co/datasets/endomorphosis/Caselaw_Access_Project_JSON or use this pin syncer notice the pins located in pin_store
python3 upload_pins.py
will upload them to web3storage, pinata, filebase, and lighthouse storage and there will instear be a web3storage_pins.tsv file with the new pins to import into datasets ... but only web3stoage is working right now.
IPFS Huggingface Bridge:
for huggingface datasets python library visit: https://github.com/endomorphosis/ipfs_datasets/
for transformers python library visit: https://github.com/endomorphosis/ipfs_transformers/
for transformers js client visit:
https://github.com/endomorphosis/ipfs_transformers_js/
for orbitdbkit nodejs library visit: https://github.com/endomorphosis/orbitdb_kit/
for fireproof_kit nodejs library visit: https://github.com/endomorphosis/fireproof_kit
for python model manager library visit: https://github.com/endomorphosis/ipfs_model_manager/
for nodejs model manager library visit: https://github.com/endomorphosis/ipfs_model_manager_js/
for nodejs ipfs huggingface scraper with pinning services visit: https://github.com/endomorphosis/ipfs_huggingface_scraper/
Author - Benjamin Barber QA - Kevin De Haan
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ipfs_faiss_py-0.0.3.tar.gz
.
File metadata
- Download URL: ipfs_faiss_py-0.0.3.tar.gz
- Upload date:
- Size: 15.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d0e4179e8ed6c3bc681b42fdf3111c11eea47d0db6c3f3e8d2b3920ca188225 |
|
MD5 | ee5c18d9125954aa9dbfd57d465ea27d |
|
BLAKE2b-256 | b6073bf8f8736287b38507e1d4a36ff8132cd570790feb0be3d226fe69cd4518 |
File details
Details for the file ipfs_faiss_py-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: ipfs_faiss_py-0.0.3-py3-none-any.whl
- Upload date:
- Size: 15.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f8d7b0c5dbc8d11ea08c0afaf3584a0ac876d1ca19fc72fec54fd197a9ef2ab6 |
|
MD5 | 0669c4fa89c28cadc08e8afe03ca45c8 |
|
BLAKE2b-256 | 36a4b2afa89bc8ebf2283060e2b39085114183d5a3a78bf1dc97ee6b00de5e57 |