No project description provided
Project description
Bani
This package aims to provide an easy way to set up a question answering system,
Taking as input just raw text question answer pairs. The principal used is question similirity, ie the most similar question to a
given query is found and the answer corrosponding to the said question is answered. For this purpose KNN algorithm is used, And Batch hard Tripet
Loss is used to train a sentence transformer model.
Installation
Install with pip
pip install Bani
python -m spacy download en_core_web_md
This will install all the necessary packages , including the correct version of sentence transformers and transformers.
Copy the source code
Clone or download the source and then
python -m spacy download en_core_web_md
cd Bani ; pip install -r requirements
Getting Started
See the tutorial notebook for a quick introduction to the usage of the package.
Docs
FAQ
class FAQ (self,name : str,questions : List[str] = None, answers : List[str] = None)
All the user supplied FAQs are stored in the FAQ class, The FAQ class further runs sanity checks on the faqs ,and provides interface to
generate questions and assign vectors.
Parameters
name : The name of an FAQ , all FAQs must have unique names.
questions : list of questions or None.
answers : list of corrosponding answers or None.
(if questions are None answers must also be None , and the FAQ will be empty , you can load this empty faq with another presaved FAQ)
Methods
getAnswerWithLabel(self, label : int) -> Answer
getQuestionWithLabel(self, label : int) -> Question
buildFAQ(self,generator : GenerateManager,model = None) : this method will generate questions using the given generator , and
if the model is also provided , it will assign the vectors to questions as well.
isEmpty(self) -> bool : Returns true if the FAQ is empty
isUsable(self) -> bool : Returns true if buildFAQ has been called and questions are generated.
hasVectorsAssigned(self) -> bool : Returns true if all the questions have vectors assigned.
load(self,rootDirPath) -> None : Loads the FAQ with the name as self.name within the root directory.
save(self,rootDirPath) -> None : Saves the current object (self) as (self.name).pkl in the root directory.
resetAssignedVectors -> None : Resets all the FAQ's assigned vectors to None.
resetFAQ -> None : Resets the FAQ to an empty FAQ.
GenerateManager
class GenerateManager (self , producers : List[Any], names : List[str] = None, nums : List[int] = None)
The GenerateManager is the interface where the user can register their own sentence prodicers. The class takes care of
how to run the producers (multi processing , multi threading or single process).
Parameters
producers : list of producers (A producer is an instance of any class that implements either batch_generate method or exact_batch_generate).
names : list of names of the producers , each producer must have a unique name.
nums : list of numbers , each number indicates the max number of questions to generate from the producer.
Methods
addProducer(self,producer , name : str , toGenerate : int) : adding producer , the name must be different from the preexisting ones.
producerList(self) -> Tuple[List[str],List[int],List[Any]] : returns the names,nums and producers that are registered.
removeProducer(self, name) -> None : remove a producer from the generateManager.
Bani
class Bani(self,FAQs : List[FAQ], modelPath : str = None, assignVectors : bool = True):
The class that acts as the chatbot , It registers any number of FAQs , trains a model on the FAQs and then answers the questions on these FAQs.
Parameters
FAQs : list of instances of FAQ class. (each FAQ is given a unique id)
modelPath : The path to a pretrained model , or any model from the sentence transformers models , if None then the roberta model is pulled.
assignVectors : Whether to assign vectors wrt the new model, if true every question in all FAQs are passed through the current model , and new
vectors are assigned, if false then all the FAQs should have re assigned vectors.
Methods
train(self,outputPath : str,batchSize = 16, epochs : int = 1, **kwargs) : method to train the model , after training the new model is loaded and
the FAQ vectors are reassigned using this model.
saveFAQs(self, rootDirPath : str) : method to save the FAQ with vectors assigned to rootDirPath , so that the next time you can set,
assignVectors to False, if you are loading these FAQs (Just to save time).
getFAQWithId(self, id : int) -> FAQ: method to get the faq wrt the given id , the indexing starts from 0.
findClosestFromFAQ(self,faqId : int, query : str, K : int = 3, topSimilar : int = 5) -> FAQOutput : Takes in a user query and runs the knn algo over it.
with K as K, and returns a FAQOutput object, whick topSimilar number of closest questions. The query is processed only
over the 'faqId' FAQ.
findClosest(self,query : str, K : int = 3 , topSimilar : int = 5) -> List[FAQOutput] : The same as findClosestFromFAQ, but here the query is run over all the,
FAQs and the result is a list of FAQOutputs , the length of the list is the same as the number of FAQs.
test(self,faqId : int,testData : List[Tuple[str,str]], K : int = 3) -> float: Interface to test any given faq , expects a list of tuples of size 2
first element is the orignal question and second is the paraphrased version. All the orignal question should ideally match the questions in the FAQ , if not you will be warned about it.
FAQOutput
The user will get this , or a list of FAqOutput ,as the output for any query. It contains.
answer : Answer : The actual answer
question : Question : The question that is being answered. (A generated question may be being answered, but only orignal question is given here)
faqName : str, : name of the faq the answer is from.
faqId : int, : Id of the FAQ wrt the Bani object.
score : float : Combined KNN score
similarQuestions : List[str] : Similar questions to the query asked , from the said FAQ.
maxScore : float : The question with maximum similirity with the query.
Adding your own producers(sentence_generator)
The quality of the FAQ is directely related to the quality of questions produced, As such Bani comes with a default
question generation pipeline , but also gives full freedom to customize or add your own producers.
A producer is an instance of any class that implements either batch_generate method or exact_batch_generate
class MyProducer1:
def __init__(self):
pass
def batch_generate(questions : List[str]) -> Dict[str, List[str]]:
"""
Takes list of questions and returns a dict , with each question
mapped to the list of generated questions
"""
resultDict = dict()
for question in questions:
resultDict[question] = ["generated1", "generated2", "and so on"]
return resultDict
The objects that implement exact_batch_generate will produce at most n questions for a given question.
class MyProducer2:
def __init__(self):
pass
def exact_batch_generate(questions : List[str], num : int) -> Dict[str, List[str]]:
"""
Takes list of questions and returns a dict , with each question
mapped to the list of generated questions , for each question at most num questions are generated
"""
resultDict = dict()
for question in questions:
resultDict[question] = ["generated1", "generated2", "and so on"]
return resultDict
Each of the producers are registered in a GenerateManager , with their names and how many questions to generate at max from
the producer.
from Bani.core.generation import GenerateManager
names = ["myProducer1_name", "myProducer2_name"]
toGenerate = [3,5] # At max generate 3 for first producer and 5 for second
producers = [MyProducer1(), MyProducer2()]
myGenerateManager = GenerateManager(producers = producers , names = names , nums = toGenerate)
# Or you can register the producers one by one
myGenerateManager.addProducer(producer = myProducer3, name = "myProducer3Name", togenerate = 5)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file Bani-0.7.2.tar.gz
.
File metadata
- Download URL: Bani-0.7.2.tar.gz
- Upload date:
- Size: 31.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 03c12d00a2aaa755d63abfb058e57a37596971c28b70a22c91c026d018ea0299 |
|
MD5 | 39a9c7c8d72171e566ffac9ab4e5e62d |
|
BLAKE2b-256 | 8aef4d7d4c879182d009b66c12232a3af08fcf1056a1e04653054ce458d76481 |
File details
Details for the file Bani-0.7.2-py3-none-any.whl
.
File metadata
- Download URL: Bani-0.7.2-py3-none-any.whl
- Upload date:
- Size: 47.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | afdad519c4adfee302e11a92d1994f276210b5730af1e7f373adf538b8965c72 |
|
MD5 | 46ef99f403f04ca015e1344feb4dc736 |
|
BLAKE2b-256 | e1f18694172629cec21b47616cafd8b4bb493e9a4662e2bdfc0d18aa14767d75 |