Similar sentence prediction
Project description
Similar sentence Prediction with more accurate results with your dataset on top of BERT pertained model.
Setup
Install the package
pip install similar-sentences
Methods to know
SimilarSentences(FilePath,Type)
- FilePath: Reference to model.zip for prediction. Reference to sentences.txt for training.
- Type:
predict
ortrain
.train(PreTrainedModel)
- Used for training the setences. Which required
(".txt", "train")
as parameter in SimilarSentences - PreTrainedModel (optional): Any of the below model can be passed for training,by default #1 will be applied
- bert-base-nli-mean-tokens: BERT-base model with mean-tokens pooling. Performance: STSbenchmark: 77.12
- bert-base-nli-max-tokens: BERT-base with max-tokens pooling. Performance: STSbenchmark: 77.21
- bert-base-nli-cls-token: BERT-base with cls token pooling. Performance: STSbenchmark: 76.30
- bert-large-nli-mean-tokens: BERT-large with mean-tokens pooling. Performance: STSbenchmark: 79.19
- bert-large-nli-max-tokens: BERT-large with max-tokens pooling. Performance: STSbenchmark: 78.41
- bert-large-nli-cls-token: BERT-large with CLS token pooling. Performance: STSbenchmark: 78.29
- roberta-base-nli-mean-tokens: RoBERTa-base with mean-tokens pooling. Performance: STSbenchmark: 77.49
- roberta-large-nli-mean-tokens: RoBERTa-base with mean-tokens pooling. Performance: STSbenchmark: 78.69
- distilbert-base-nli-mean-tokens: DistilBERT-base with mean-tokens pooling. Performance: STSbenchmark: 76.97
.predict(InputSentences, NumberOfPrediction, DesiredJsonOutput)
- Used for predicting the setences. Which required
(".zip", "predict")
as parameter in SimilarSentences - InputSentences: To find the similar sentence for.
- NumberOfPrediction: Number of results for the prediction
- DesiredJsonOutput: The output will be in JSON format.
simple
produces a plain output.detailed
produces detailed output with score
.reload()
- Used for reloading (or) updating the model. Which required
(".zip", "predict")
as parameter in SimilarSentences
.batch_predict(BatchFile,NumberOfPrediction)
- This method will export the data with 3 columns in excel format. Columns ['Sentence','Suggestion','Score']
- BatchFile: Batch file with sentences to predict, has to be in .txt format.
- NumberOfPrediction: Number of results for the prediction
Getting Started
Train the model with your dataset
Prepare your dataset and save the content to sentences.txt
Hi, thanks for contacting.
Hello there!
Hi there, welcome!
Hi, how can I help?
In a few words, how can help?
Hi again, welcome back.
Hi! Welcome back.
Good morning!
Good afternoon!
Good evening!
Good morning! Welcome.
Good afternoon! Welcome.
Good evening! Welcome.
Hello, how can I help?
Welcome.
Welcome back.
Thanks for contacting.
Goodbye!
Thanks for contacting. Goodbye!
Thanks for contacting. Bye!
Happy to help!
Glad I could help!
Supply the sentences to build the model.
from SimilarSentences import SimilarSentences
# Make sure the extension is .txt
model = SimilarSentences('sentences.txt',"train")
model.train()
The code snipet will produce model.zip.
Predicting from your model
Load the model.zip from the training.
from SimilarSentences import SimilarSentences
model = SimilarSentences('model.zip',"predict")
text = 'Hi.How are you doing?'
simple = model.predict(text, 2, "simple")
detailed = model.predict(text, 2, "detailed")
print(simple)
print(detailed)
Output looks like,
#simple output
[
"Hello there! Did I get that right?",
"Right Hi, how can I help?"
]
#detailed output
[
[
{
"sentence": "Hello there!",
"score": 0.938870553799856
},
{
"sentence": "Did I get that right?",
"score": 0.7910412586610753
}
],
[
{
"sentence": "Right",
"score": 0.9161810654762793
},
{
"sentence": "Hi, how can I help?",
"score": 0.7824734658953297
}
]
]
:+1: :sparkles: :camel: :tada: :rocket: :metal: :octocat: HAPPY CODING :octocat: :metal: :rocket: :tada: :camel: :sparkles: :+1:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
similar-sentences-3.0.tar.gz
(5.6 kB
view hashes)