Skip to main content

Sentence-Transformer embeddings for scikit-learn

Project description

sklearn-embeddings

Overview

sklearn-embeddings is a Python package that integrates sentence-transformer based embeddings with scikit-learn classifiers and clustering algorithms. This allows users to leverage powerful natural language processing capabilities within the familiar scikit-learn framework.

Installation

To install sklearn-embeddings, you can use pip:

pip install sklearn-embeddings

Usage

Here is a simple example of how to use sklearn-embeddings with a scikit-learn classifier:

import joblib

from sklearn_embeddings import SentenceTransformerEmbedding
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline

# Sample data
documents = ["The food was great.", "Not expensive and good service", "Not worth the money", "I've had better"]

# Labels
is_positive = [True, True, False, False]

# Create a pipeline with the embedding model and a classifier
pipeline = make_pipeline(
    SentenceTransformerEmbedding(), 
    # SentenceTransformerEmbedding('paraphrase-MiniLM-L6-v2'), 
    # SentenceTransformerEmbedding('/my/local/folder/paraphrase-MiniLM-L6-v2'), 
    # SentenceTransformerEmbedding(SentenceTransformer('paraphrase-MiniLM-L6-v2')), 
    LogisticRegression()
    )

# Fit the model
pipeline.fit(documents, is_positive)

# Make predictions
predictions = pipeline.predict(["So delicious!", "Not for me"])

# Write the whole pipeline to disk
joblib.dump(pipeline, 'model.joblib')

Perhaps the greatest benefit of this library is that it allows you to use Scikit-learn pipelines to combine encoding and labeling in a single function call.

import joblib

model = joblib.load('model.joblib')

# Use the loaded pipeline as a simple model, it takes care of sentence-transformer encoding for you!
model.predict(["This is a sentence"])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sklearn_embeddings-0.1.3.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sklearn_embeddings-0.1.3-py3-none-any.whl (3.8 kB view details)

Uploaded Python 3

File details

Details for the file sklearn_embeddings-0.1.3.tar.gz.

File metadata

  • Download URL: sklearn_embeddings-0.1.3.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for sklearn_embeddings-0.1.3.tar.gz
Algorithm Hash digest
SHA256 387573b43a9e22c90029d9b592316783e61955d70ea4d1792672d21f478575c7
MD5 451b288bef11bcb85c9bf899c4f9da23
BLAKE2b-256 5da90b9d86fd71e38b875bc46dc080f1799366f1f01f699bad257c2b69d1424d

See more details on using hashes here.

File details

Details for the file sklearn_embeddings-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for sklearn_embeddings-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6f530174ad1e000915416c5084077ee1336c2be4054cc2195a2bc1bf30ed64ce
MD5 fe0b19f9f5f16f46b6ac3b5d9e1c86f8
BLAKE2b-256 3fcbf9d2aa65624fb2d3d6131a536aa24a0f8f2616a641370563ee708f42cee0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page