Sentence-Transformer embeddings for scikit-learn
Project description
sklearn-embeddings
Overview
sklearn-embeddings is a Python package that integrates sentence-transformer based embeddings with scikit-learn classifiers and clustering algorithms. This allows users to leverage powerful natural language processing capabilities within the familiar scikit-learn framework.
Installation
To install sklearn-embeddings, you can use pip:
pip install sklearn-embeddings
Usage
Here is a simple example of how to use sklearn-embeddings with a scikit-learn classifier:
import joblib
from sklearn_embeddings import SentenceTransformerEmbedding
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
# Sample data
documents = ["The food was great.", "Not expensive and good service", "Not worth the money", "I've had better"]
# Labels
is_positive = [True, True, False, False]
# Create a pipeline with the embedding model and a classifier
pipeline = make_pipeline(
SentenceTransformerEmbedding(),
# SentenceTransformerEmbedding('paraphrase-MiniLM-L6-v2'),
# SentenceTransformerEmbedding('/my/local/folder/paraphrase-MiniLM-L6-v2'),
# SentenceTransformerEmbedding(SentenceTransformer('paraphrase-MiniLM-L6-v2')),
LogisticRegression()
)
# Fit the model
pipeline.fit(documents, is_positive)
# Make predictions
predictions = pipeline.predict(["So delicious!", "Not for me"])
# Write the whole pipeline to disk
joblib.dump(pipeline, 'model.joblib')
Perhaps the greatest benefit of this library is that it allows you to use Scikit-learn pipelines to combine encoding and labeling in a single function call.
import joblib
model = joblib.load('model.joblib')
# Use the loaded pipeline as a simple model, it takes care of sentence-transformer encoding for you!
model.predict(["This is a sentence"])
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sklearn_embeddings-0.1.3.tar.gz.
File metadata
- Download URL: sklearn_embeddings-0.1.3.tar.gz
- Upload date:
- Size: 4.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
387573b43a9e22c90029d9b592316783e61955d70ea4d1792672d21f478575c7
|
|
| MD5 |
451b288bef11bcb85c9bf899c4f9da23
|
|
| BLAKE2b-256 |
5da90b9d86fd71e38b875bc46dc080f1799366f1f01f699bad257c2b69d1424d
|
File details
Details for the file sklearn_embeddings-0.1.3-py3-none-any.whl.
File metadata
- Download URL: sklearn_embeddings-0.1.3-py3-none-any.whl
- Upload date:
- Size: 3.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f530174ad1e000915416c5084077ee1336c2be4054cc2195a2bc1bf30ed64ce
|
|
| MD5 |
fe0b19f9f5f16f46b6ac3b5d9e1c86f8
|
|
| BLAKE2b-256 |
3fcbf9d2aa65624fb2d3d6131a536aa24a0f8f2616a641370563ee708f42cee0
|