document QA package using langchain and chromadb

Project description


License: MIT pypi package downloads python version : 3.8 3.9 3.10 GitLab CI

Akasha simplifies document-based Question Answering (QA) by harnessing the power of Large Language Models to accurately answer your queries while searching through your provided documents. Use Retrieval Augmented Generation (RAG) to make LLM generate correct information from documents.

With Akasha, you have the flexibility to choose from a variety of language models, embedding models, and search types. Adjusting these parameters is straightforward, allowing you to optimize your approach and discover the most effective methods for obtaining accurate answers from Large Language Models.

For the chinese manual, please visit manual


We recommend using Python 3.9 to run our akasha package. You can use Anaconda to create virtual environment.

# create environment

conda create --name py3-9 python=3.9
activate py3-9

# install akasha
pip install akasha-terminal

API Keys


If you want to use openai models or embeddings, go to openai to get the API key. You can either save OPENAI_API_KEY=your api key into .env file to current working directory or, set as a environment variable, using export in bash or use os.environ in python.

# set a environment variable

export OPENAI_API_KEY="your api key"


If you want to use azure openai, go to auzreAI and get you own Language API base url and key. Also, remember to depoly all the models in Azure OpenAI Studio, the deployment name should be same as the model name. save OPENAI_API_KEY=your azure key, OPENAI_API_BASE=your Language API base url, OPENAI_API_TYPE=azure, OPENAI_API_VERSION=2023-05-15 into .env file to current working directory.

If you want to save both openai key and azure key at the same time, you can also use AZURE_API_KEY, AZURE_API_BASE, AZURE_API_TYPE, AZURE_API_VERSION

## .env file
AZURE_API_KEY={your azure key}
AZURE_API_BASE={your Language API base url}

And now we can run akasha in python

import akasha
ak = akasha.Doc_QA(model="openai:gpt-3.5-turbo")
response = ak.get_response(dir_path, prompt)


If you want to use original meta-llama model, you need to both register to huggingface to get access token and meta-llama to request access.

Remember, the account on Hugging Face and the email you use to request access to Meta-Llama must be the same, so that you can download models from Hugging Face once your account is approved.

You should see the Gated model You have been granted access to this model once your account is approved image

Again, you can either save HUGGINGFACEHUB_API_TOKEN=your api key into .env file to current working directory or set as a environment variable, using export in bash or use os.environ in python. After you create Doc_QA() class, you can still change the model you want when you call the function.

# set a environment variable

export HUGGINGFACEHUB_API_TOKEN="your api key"
import akasha
ak = akasha.Doc_QA()
response = ak.get_response(dir_path, prompt, model="hf:meta-llama/Llama-2-7b-chat-hf")

Example and Parameters

Basic get_response OpenAI example

import akasha
import os

os.environ["OPENAI_API_KEY"] = "your openAI key"

dir_path = "doc/"
prompt = "「塞西莉亞花」的花語是什麼?	「失之交臂的感情」	「赤誠的心」	「浪子的真情」	「無法挽回的愛」"
ak = akasha.Doc_QA()
response = ak.get_response(dir_path, prompt)

Select different embeddings

Using parameter "embeddings", you can choose different embedding models, and the embedding model will be used to store documents into vector storage and search relevant documents from prompt. Default is openai:text-embedding-ada-002.

Currently support openai, huggingface and tensorflowhub.

huggingface example

ak = akasha.Doc_QA(embeddings="huggingface:all-MiniLM-L6-v2")
resposne = ak.get_response(dir_path, prompt)

To use huggingface embedding models, you can type huggingface:model_name or hf:model_name, for example, huggingface:all-MiniLM-L6-v2

Select different models

Using parameter "model", you can choose different text generation models, default is openai:gpt-3.5-turbo.

Currently support openai, llama-cpp, huggingface and remote.

1. openai example

ak = akasha.Doc_QA()
ak.get_response(dir_path, prompt, embeddings="openai:text-embedding-ada-002", model="openai:gpt-3.5-turbo")

2.huggingface example

ak = akasha.Doc_QA()
ak.get_response(dir_path, prompt, embeddings="huggingface:all-MiniLM-L6-v2", model="hf:meta-llama/Llama-2-13b-chat-hf")

To use text generation model from huggingface, for example, meta llama, you can type hf:meta-llama/Llama-2-13b-chat-hf

3.llama-cpp example

llama-cpp can use quantized llama model and run on cpu, after you download or transfer llama-cpp model file using llama-cpp-python.

ak = akasha.Doc_QA()
ak.get_response(dir_path, prompt, embeddings="huggingface:all-MiniLM-L6-v2", model="llama-cpu:model/llama-2-13b-chat.Q5_K_S.gguf")

For example, if q5 model is in the "model/" directory, you can assign llama-cpu:model/llama-2-13b-chat.Q5_K_S.gguf to load model.

ak = akasha.Doc_QA()
ak.get_response(dir_path, prompt, embeddings="huggingface:all-MiniLM-L6-v2", model="llama-gpu:model/llama-2-3b-chat.Q5_K_S.gguf")

you can also combine gpu with cpu to run llama-cpp, using llama-gpu:model/llama-2-13b-chat.Q5_K_S.gguf

4. remote server api example

If you deploy your own language model in other server using TGI (Text Generation Inference), you can use remote:{your LLM api url} to call the model.

ak = akasha.Doc_QA()
ak.get_response(dir_path, prompt,  model="remote:")

5. gptq quantized model

If you download gptq quantized model you can use gptq:{model_name} to call the model.

ak = akasha.Doc_QA()
ak.get_response(dir_path, prompt,  model="gptq:FlagAlpha/Llama2-Chinese-13b-Chat-4bit")

Select different search type

Using parameter "search_type", you can choose different search methods to find similar documents , default is merge, which is the combination of mmr, svm and is another strategy combine bm25/tfidf with svm . Currently you can select merge, mmr, svm and tfidf, bm25, auto, auto_rerank.

Max Marginal Relevance(mmr) select similar documents by cosine similarity, but it also consider diversity, so it will also penalize document for closeness to already selected documents.

Support Vector Machines(svm) use the input prompt and the documents vectors to train svm model, after training, the svm can be used to score new vectors based on their similarity to the training data.

Term Frequency–Inverse Document Frequency(tfidf) is a commonly used weighting technique in information retrieval and text mining. TF-IDF is a statistical method used to evaluate the importance of a term in a collection of documents or a corpus with respect to one specific document in the collection.

Okapi BM25(bm25) (BM is an abbreviation of best matching) is a bag-of-words retrieval function that ranks a set of documents based on the query terms appearing in each document, regardless of their proximity within the document. It is a family of scoring functions with slightly different components and parameters.

ak = akasha.Doc_QA(search_type="merge")
akasha.get_response(dir_path, prompt, search_type="mmr")

Some models you can use

Please note that for OpenAI models, you need to set the environment variable 'OPENAI_API_KEY,' and for most Hugging Face models, a GPU is required to run the models. However, for .gguf models, you can use a CPU to run them.

openai_model = "openai:gpt-3.5-turbo"  # need environment variable "OPENAI_API_KEY" or "AZURE_API_KEY"
openai4_model = "openai:gpt-4"  # need environment variable "OPENAI_API_KEY" or "AZURE_API_KEY"
openai4o_model = "openai:gpt-4o" # need environment variable "OPENAI_API_KEY"
huggingface_model = "hf:meta-llama/Llama-2-7b-chat-hf" #need environment variable "HUGGINGFACEHUB_API_TOKEN" to download meta-llama model
quantized_ch_llama_model = "gptq:FlagAlpha/Llama2-Chinese-13b-Chat-4bit"
taiwan_llama_gptq = "gptq:weiren119/Taiwan-LLaMa-v1.0-4bits-GPTQ"
mistral = "hf:Mistral-7B-Instruct-v0.2" 
mediatek_Breeze = "hf:MediaTek-Research/Breeze-7B-Instruct-64k-v0.1"
### If you want to use llama-cpp to run model on cpu, you can download gguf version of models 
### from  and the name behind "llama-gpu:" or "llama-cpu:"
### from
### is the path of the downloaded .gguf file
llama_cpp_model = "llama-gpu:model/llama-2-13b-chat-hf.Q5_K_S.gguf"  
llama_cpp_model = "llama-cpu:model/llama-2-7b-chat.Q5_K_S.gguf"
llama_cpp_chinese_alpaca = "llama-gpu:model/chinese-alpaca-2-7b.Q5_K_S.gguf"
llama_cpp_chinese_alpaca = "llama-cpu:model/chinese-alpaca-2-13b.Q5_K_M.gguf"
chatglm_model = "chatglm:THUDM/chatglm2-6b"

Some embeddings you can use

Please noted that each embedding model has different window size, texts that over the max seq length will be truncated and won't be represent in embedding model.

Rerank_base and rerank_large are not embedding models; instead, they compare the query to each chunk of the documents and return scores that represent the similarity. As a result, they offer higher accuracy compared to embedding models but may be slower.

openai_emd = "openai:text-embedding-ada-002"  # need environment variable "OPENAI_API_KEY"  # 8192 max seq length
huggingface_emd = "hf:all-MiniLM-L6-v2" 
text2vec_ch_emd = "hf:shibing624/text2vec-base-chinese"   # 128 max seq length 
text2vec_mul_emd = "hf:shibing624/text2vec-base-multilingual"  # 256 max seq length
text2vec_ch_para_emd = "hf:shibing624/text2vec-base-chinese-paraphrase" # perform better for long text, 256 max seq length
bge_en_emd = "hf:BAAI/bge-base-en-v1.5"  # 512 max seq length
bge_ch_emd = "hf:BAAI/bge-base-zh-v1.5"  # 512 max seq length

rerank_base = "rerank:BAAI/bge-reranker-base"    # 512 max seq length
rerank_large = "rerank:BAAI/bge-reranker-large"  # 512 max seq length


Use chain-of-thought to solve complicated problem

instead of input one single prompt, you can input multiple small stop questions to get better answer.

import akasha
import os

os.environ["OPENAI_API_KEY"] = "your openAI key"

dir_path = "mic/"
queries2 = ["西門子自有工廠如何朝工業4.0 發展","詳細解釋「工業4.0 成熟度指數」發展路徑的六個成熟度","根據西門子自有工廠朝工業4.0發展,探討其各項工業4.0的成熟度指標"]
ak = akasha.Doc_QA()
response = ak.chain_of_thought(dir_path, queries2, search_type='svm')
response 1:

1. 數位化戰略:西門子提出數位化戰略,從工業4.0策略擬定到落地執行,為客戶提供一條龍服務。他們設計數位工廠原型

2. 跨領域合作:西門子近年積極與雲服務商、系統商等跨領域合作,推動智慧製造解決方案。此外,他們也與SAP進行ERP整合,專注於物聯網領域。

3. 虛實整合:西門子在中國大陸成都生產研發基地的案例中,從研發、生產、訂單管理、供應商管理到物流作業


response 2:

1. 電腦化:這是工業4.0發展的起點,指企業開始使用計算機技

2. 可連結:在這個成熟度階段,企業開始將各個IT系統進行連接,實現資料的串聯。這使得不同系統之間可以共享資料,提高資訊的流通效率。例 

3. 可視化:在這個成熟度階段,企業開始實現資料的可視化,將資料以圖形化或圖表化的方

4. 可分析:在這個成熟度階段,企業開始進 

5. 可預測:在這個成熟度階段,企業開始利用資料分析的結果來進行預測和預測模型的建立。這使得企業可以預測生產過程中可能出現的問題,並 

6. 自適應:在這個成熟度階段,企業開始實現自動化和自適應能 


response 3:


1. 數位化戰略:西門子提出數位化戰略,從工業4.0策略擬定到落地執行提供一條龍服務


2. 整合系統:西門子在廠內進行軟體間整合,包括PLM、ERP、MOM 


3. 數據 應用:西門子利用自有的數位雙生軟體Tecnomatix,打造虛擬工廠,模擬生產狀況或監控實際生產狀況。這代表企業在工業4.0成熟度指標中已經達到了可分析和可預測的階段,並能夠利用 



ask question from a single file

If there's only a short single file document, you can use ask_whole_file to ask LLM with the whole document file noted that the length of the document can not larger than the window size of the model.


import akasha

ak = akasha.Doc_QA(

response = ak.ask_whole_file(system_prompt="用列舉的方式描述"

1. 「工業 4.0成熟度指數」:由德國國家工程院(Acatech)提出,將發展階段劃分為電腦化、可連結、可視化、可分析、可預測、自適應共六個成熟度,前項為後項發展基礎。

2. 「新加坡工業智慧指數」(Singapore Smart Industry Readiness Index, SIRI):由新加坡政府提出,用於評估企業在工業4.0的發展程度。

3. 「工業 4.0實施步驟方法論」:這是一種實施工業4.0的具體步驟,包括盤點公司內部待改善問題,分析現況與預期目標差異,以及規劃具體要改善的業務流程路線圖。

directly offer information to ask question

If you do not want to use any document file, you can use ask_self function and input the information you need using parameter info, info can be string or list of string.


install_requires = [

ak = akasha.Doc_QA(
response = ak.ask_self(prompt="langchain的套件版本?", info=install_requires)

### Arguments of Doc_QA class ###
            **embeddings (str, optional)**: the embeddings used in query and vector storage. Defaults to "text-embedding-ada-002".\n
            **chunk_size (int, optional)**: chunk size of texts from documents. Defaults to 1000.\n
            **model (str, optional)**: llm model to use. Defaults to "gpt-3.5-turbo".\n
            **verbose (bool, optional)**: show log texts or not. Defaults to False.\n
            **threshold (float, optional)**: the similarity threshold of searching. Defaults to 0.2.\n
            **language (str, optional)**: the language of documents and prompt, use to make sure docs won't exceed
                max token size of llm input.\n
            **search_type (str, optional)**: search type to find similar documents from db, default 'merge'.
                includes 'merge', 'mmr', 'svm', 'tfidf', also, you can custom your own search_type function, as long as your
                function input is (query_embeds:np.array, docs_embeds:list[np.array], k:int, relevancy_threshold:float, log:dict) 
                and output is a list [index of selected documents].\n
            **record_exp (str, optional)**: use aiido to save running params and metrics to the remote mlflow or not if record_exp not empty, and set record_exp as experiment name.  default "".\n
            **system_prompt (str, optional)**: the system prompt that you assign special instruction to llm model, so will not be used
                in searching relevant documents. Defaults to "".\n
            **max_doc_len (int, optional)**: max document size of llm input. Defaults to 3000.\n
            **temperature (float, optional)**: temperature of llm model from 0.0 to 1.0 . Defaults to 0.0.\n
            **use_chroma (bool, optional)**: use chroma db name instead of documents path to load data or not. Defaults to False.
            **use_rerank (bool, optional)**: use rerank model to re-rank the selected documents or not. Defaults to False.
            **ignore_check (bool, optional)**: speed up loading data if the chroma db is already existed. Defaults to False.


If you want to ask a complex question, use ask_agent, using LLM to get intermediate answers of the question and can help LLM get better response.


ak = akasha.Doc_QA(
res = ak.ask_agent(
    doc_path="./docs/mic/",  #   

LPWAN的頻寬相對較低(0.3 KBps  50KBps),延遲較高(秒 - 分),且成本較低。它的主要優點是低耗能、支援長距離傳輸,並且可以連接大量的設備。然而,由於其頻寬和延遲的限制,LPWAN在製造業中的

相較之下,5G提供的頻寬範圍在1-10 Gbps,而延遲則在1-10 ms之間,成本較高。這使得5G非常適合需要高時序精密度的應用,例如異質設備協作、遠端操控、混合現實(MR)巡檢維修等。此外,5G網路在大型

Save Logs

In Doc_QA, Eval and Summary, if you set keep_logs to True, each time you run any function from akasha, it will save logs that record the parameters of this run and the results. Each run will have a timestamp, you can use {obj_name}.timestamp_list to check them, and use it to find the log of the run you want see.

You can also save logs into .txt file or .json file

qa = akasha.Doc_QA(verbose=False, search_type="merge", max_doc_len=1500, keep_logs=True)
query1 = "五軸是什麼"
qa.get_response(doc_path="./doc/mic/", prompt = query1)
qa.get_response(doc_path="./doc/mic/", prompt = query1)

tp = qa.timestamp_list
## ["2023/09/26, 10:52:36", "2023/09/26, 10:59:49", "2023/09/26, 11:09:23"]

## {"fn_type":"get_response","search_type":"merge", "max_doc_len":1500,....."response":....}



Use AiiDO to record experiment

If you want to record experiment metrics and results, you need to create a project on the AiiDO platform. Once done, you will receive all the necessary parameters for automatically uploading the experiment.

Create a .env file on the same directory of your program, and paste all parameters.

.env file


After you created .env file, you can use record_exp to set your experiment name and it will automatically record experiment metrics and results to mlflow server.

import akasha
import os
from dotenv import load_dotenv

os.environ["OPENAI_API_KEY"] = "your openAI key"

dir_path = "doc/"
prompt = "「塞西莉亞花」的花語是什麼?	「失之交臂的感情」	「赤誠的心」	「浪子的真情」	「無法挽回的愛」"
exp_name = "exp_akasha_get_response"
ak = akasha.Doc_QA(record_exp=exp_name)
response = ak.get_response(dir_path, prompt)

In an experiment you assign, the run name is the combinations of the usage of embedding, search type and model name


You can also compare the responses from different models, search type and embeddings


Auto Evaluation

To evaluate the performance of current parameters, you can use function auto_evaluation . First you need to build a question set .txt file based on the documents you want to use. You can either generate single choice question file or essay question file.

  1. For single choice question file, every options and the correct answer is separated by tab(\t), each line is a question, for example: (question_pvc.txt)
應回收廢塑膠容器材質種類不包含哪種?	聚丙烯(PP)	聚苯乙烯(PS)	聚氯乙烯(PVC)	低密度聚乙烯(LDPE)	4
庫存盤點包括庫存全盤作業及不定期抽盤作業,盤點計畫應包括下列項目不包含哪項?	盤點差異之處理	盤點清冊	各項物品存放區域配置圖	庫存全盤日期及參加盤點人員名單	1
以下何者不是環保署指定之公民營地磅機構?	中森加油站企業有限公司	台益地磅站	大眾地磅站	新福行	4

it will return the correct rate and tokens of the question set, details of each question would save in logs, or in mlflow server if you turn on record_exp

import akasha.eval as eval
import os
from dotenv import load_dotenv

os.environ["OPENAI_API_KEY"] = "your openAI key"
dir_path = "doc/pvc/"
exp_name = "exp_akasha_auto_evaluation"

eva = eval.Model_Eval(question_style="single_choice", search_type='merge',\
    model="openai:gpt-3.5-turbo", embeddings="openai:text-embedding-ada-002",record_exp=exp_name)
print(eva.auto_evaluation("question_pvc.txt", dir_path ))
## correct rate: 0.9, tokens: 3228 ##

  1. For essay question file , each question has "問題:" before it, and each reference answer has "答案:" before it. Each question is separated by two newline(\n\n)
問題:根據文件中的訊息,智慧製造的複雜性已超越系統整合商的負荷程度,未來產業鏈中的角色將傾向朝共和共榮共創智慧製造商機,而非過往的單打獨鬥模式發展。請問為什麼  供  應商、電信商、軟體開發商、平台商、雲端服務供應商、系統整合商等角色會傾向朝共和共榮共創智慧製造商機的方向發展?

答案:根據文件中的資訊,NVIDIA的邊緣運算產品包括Jetson系列和EGX系列,而IBM的邊緣運算產品包括IBM Edge Application Manager和IBM Watson Anywhere。

Use llm to create questionset and evaluate the performance

If you prefer not to create your own question set to assess the performance of the current parameters, you can utilize the eval.auto_create_questionset feature to automatically generate a question set along with reference answers. Subsequently, you can use eval.auto_evaluation to obtain metrics scores such as Bert_score, Rouge, and LLM_score for essay questionset and correct rate for single choice questionset. These scores range from 0 to 1, with higher values indicating that the generated response closely matches the reference answers.

For example, the code create a questionset text file 'mic_1.txt' with ten questions and reference answers, each question is randomly generated from the content segments of given documents in 'doc/mic/' directory. Then you can use the questionset text file to evaluate the performance of the parameters you want to test.

import akasha.eval as eval

eva = eval.Model_Eval(question_style="essay", search_type='merge',\
      model="openai:gpt-3.5-turbo", embeddings="openai:text-embedding-ada-002",record_exp="exp_mic_auto_questionset")

eva.auto_create_questionset(doc_path="doc/mic/", question_num=10, output_file_path="questionset/mic_essay.txt")

bert_score, rouge, llm_score, tol_tokens = eva.auto_evaluation(questionset_file="questionset/mic_essay.txt", doc_path="doc/mic/", question_style = "essay", record_exp="exp_mic_auto_evaluation",search_type="svm")

# bert_score = 0.782
# rouge = 0.81
# llm_score = 0.393

Use different question types to test different abilities of LLM

question_types parameter offers four question types, fact, summary, irrelevant, compared, default is fact.

import akasha.eval as eval

eva = eval.Model_Eval(search_type='merge', question_type = "irrelevant", model="openai:gpt-3.5-turbo", record_exp="exp_mic_auto_questionset")

eva.auto_create_questionset(doc_path="doc/mic/", question_num=10, output_file_path="questionset/mic_irre.txt")

bert_score, rouge, llm_score, tol_tokens = eva.auto_evaluation(questionset_file="questionset/mic_irre.txt", doc_path="doc/mic/", question_style = "essay", record_exp="exp_mic_auto_evaluation",search_type="svm")

assign certain topic of questionset

If you want to generate certain topic of question, you can use create_topic_questionset function, it will use the topic input to find related texts in the documents and generate question set.

import akasha.eval as eval

eva = eval.Model_Eval(search_type='merge', question_type = "irrelevant", model="openai:gpt-3.5-turbo", record_exp="exp_mic_auto_questionset")

eva.create_topic_questionset(doc_path="doc/mic/", topic= "工業4.0", question_num=3, output_file_path="questionset/mic_topic_irre.txt")

bert_score, rouge, llm_score, tol_tokens = eva.auto_evaluation(questionset_file="questionset/mic_topic_irre.txt", doc_path="doc/mic/", question_style = "essay", record_exp="exp_mic_auto_evaluation",search_type="svm")

Find Optimum Combination

To test all available combinations and find the best parameters, you can use function optimum_combination , you can give different embeddings, document chunk sizes, models, document similarity searching type, and the function will test all combinations to find the best combination based on the given question set and documents.

Noted that best score combination is the highest correct rate combination, and best cost-effective combination is the combination that need least tokens to get a correct answer.

import akasha.eval as eval
import os
from dotenv import load_dotenv

os.environ["OPENAI_API_KEY"] = "your openAI key"
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "your huggingface key"
dir_path = "doc/pvc/"
exp_name = "exp_akasha_optimum_combination"
embeddings_list = ["hf:shibing624/text2vec-base-chinese", "openai:text-embedding-ada-002"]
model_list = ["openai:gpt-3.5-turbo","hf:FlagAlpha/Llama2-Chinese-13b-Chat-4bit","hf:meta-llama/Llama-2-7b-chat-hf",\
            "llama-gpu:model/llama-2-7b-chat.Q5_K_S.gguf", "llama-gpu:model/llama-2-13b-chat.Q5_K_S.gguf"]

eva = eval.Model_Eval(question_style="single_choice")
eva.optimum_combination("question_pvc.txt", dir_path,  embeddings_list = embeddings_list, model_list = model_list,
            chunk_size_list=[200, 400, 600], search_type_list=["merge","tfidf",],record_exp=exp_name)

The result would look like below

Best correct rate:  1.000
Best score combination:  

embeddings: openai:text-embedding-ada-002, chunk size: 400, model: openai:gpt-3.5-turbo, search type: merge


embeddings: openai:text-embedding-ada-002, chunk size: 400, model: openai:gpt-3.5-turbo, search type: tfidf



Best cost-effective:

embeddings: hf:shibing624/text2vec-base-chinese, chunk size: 400, model: openai:gpt-3.5-turbo, search type: tfidf

### Arguments of Model_Eval class ###
            **embeddings (str, optional)**: the embeddings used in query and vector storage. Defaults to "text-embedding-ada-002".
            **chunk_size (int, optional)**: chunk size of texts from documents. Defaults to 1000.
            **model (str, optional)**: llm model to use. Defaults to "gpt-3.5-turbo".
            **verbose (bool, optional)**: show log texts or not. Defaults to False.
            **threshold (float, optional)**: the similarity threshold of searching. Defaults to 0.2.
            **language (str, optional)**: the language of documents and prompt, use to make sure docs won't exceed
                max token size of llm input.
            **search_type (str, optional)**: search type to find similar documents from db, default 'merge'.
                includes 'merge', 'mmr', 'svm', 'tfidf', also, you can custom your own search_type function, as long as your
                function input is (query_embeds:np.array, docs_embeds:list[np.array], k:int, relevancy_threshold:float, log:dict) 
                and output is a list [index of selected documents].
            **record_exp (str, optional)**: use aiido to save running params and metrics to the remote mlflow or not if record_exp not empty, and set record_exp as experiment name.  default "".
            **system_prompt (str, optional)**: the system prompt that you assign special instruction to llm model, so will not be used
                in searching relevant documents. Defaults to "".
            **max_doc_len (int, optional)**: max document size of llm input. Defaults to 3000.
            **temperature (float, optional)**: temperature of llm model from 0.0 to 1.0 . Defaults to 0.0.
            **question_type (str, optional)**: the type of question you want to generate, "essay" or "single_choice". Defaults to "essay".
            **use_rerank (bool, optional)**: use rerank model to re-rank the selected documents or not. Defaults to False.

File Summarization

To create a summary of a text file in various formats like .pdf, .txt, or .docx, you can use the Summary.summarize_file function. For example, the following code employs the map_reduce summary method to instruct LLM to generate a summary of approximately 500 words.

There're two summary type, map_reduce and refine, map_reduce will summarize every text chunks and then use all summarized text chunks to generate a final summary; refine will summarize each text chunk at a time and using the previous summary as a prompt for summarizing the next segment to get a higher level of summary consistency.

import akasha
sum = akasha.Summary( chunk_size=1000, chunk_overlap=100)
sum.summarize_file(file_path="doc/mic/5軸工具機因應市場訴求改變的發展態勢.pdf",summary_type="map_reduce", summary_len=500\
, chunk_overlap=40)

### Arguments of Summary class ###
            **chunk_size (int, optional)**: chunk size of texts from documents. Defaults to 1000.
            **chunk_overlap (int, optional)**: chunk overlap of texts from documents. Defaults to 40.
            **model (str, optional)**: llm model to use. Defaults to "gpt-3.5-turbo".
            **verbose (bool, optional)**: show log texts or not. Defaults to False.
            **threshold (float, optional)**: the similarity threshold of searching. Defaults to 0.2.
            **language (str, optional)**: the language of documents and prompt, use to make sure docs won't exceed
                max token size of llm input.
            **record_exp (str, optional)**: use aiido to save running params and metrics to the remote mlflow or not if record_exp not empty, and setrecord_exp as experiment name.  default "".
            **system_prompt (str, optional)**: the system prompt that you assign special instruction to llm model, so will not be used
                in searching relevant documents. Defaults to "".
            **max_doc_len(int, optional)**: max docuemnt length of llm input. Defaults to 3000.
            **temperature (float, optional)**: temperature of llm model from 0.0 to 1.0 . Defaults to 0.0.


By implementing an agent, you empower the LLM with the capability to utilize tools more effectively to accomplish tasks. You can allocate tools for tasks such as file editing, conducting Google searches, and enlisting the LLM's assistance in task execution, rather than solely relying on it to respond your questions.

In the example1, we create a tool that can collect user inputs. Additionally, we integrate a tool into the agent's functionality to store text data in a JSON file. Following the creation of the agent, we instruct it to prompt users with questions and save their responses into a file named default.json.
def input_func(question: str):
    response = input(question)
    return str({"question": question, "answer": response})

input_tool = akasha.create_tool(
    "This is the tool to ask user question, the only one param question is the question string that has not been answered and we want to ask user.",

ao = akasha.test_agent(verbose=True,
    ao("逐個詢問使用者以下問題,若所有問題都回答了,則將所有問題和回答儲存成default.json並結束。問題為:1.房間燈關了嗎? \n2. 有沒有人在家?  \n3.有哪些電器開啟?\n"
I have successfully saved all the questions and answers into the "default.json" file. The conversation is now complete.

### default.json ###
        "question": "房間燈關了嗎?",
        "answer": "no"
        "question": "有沒有人在家?",
        "answer": "no"
        "question": "有哪些電器開啟?",
        "answer": "phone, shower"

In the example2, we add wikipedia tool enabling the LLM to access the Wikipedia API for retrieving necessary information to respond to the questions posed to it. Since the response from Wiki may contain redundant information, we can use retri_observation to retrieve relevant information.

ao = akasha.test_agent(
根據查到的資訊,李遠哲(Yuan T. Lee)比黃仁勳(Jensen Huang)更老。李遠哲於1936年11月19日出生,而黃仁勳的出生日期是1963年2月17日。我已將這些資訊儲存成名為"AGE.json"的

### AGE.json ###
    "李遠哲": "1936-11-19",
    "黃仁勳": "1963-02-17",
    "答案": "李遠哲比黃仁勳更老"


If you want to get the llm response in real time, you can use stream function, it will return each round LLM response as a generator.

ao = akasha.test_agent(
st ="請用中文回答李遠哲跟黃仁勳誰比較老?將查到的資訊和答案儲存成json檔案,檔名為AGE.json")
for s in st:

Custom Search Type, Embeddings and Model

In case you want to use other search types, embeddings, or language models, you can provide your own functions as parameters for search_type, embeddings and model.

Custom Search Type

If you wish to devise your own method for identifying the most relevant documents, you can utilize your custom function as a parameter for search_type .

In the 'cust' function, we employ the Euclidean distance metric to identify the most relevant documents. It returns a list of indices representing the top k documents with distances between the query and document embeddings smaller than the specified threshold.

Here's a breakdown of the parameters:
query_embeds: Embeddings of the query. (numpy array)
docs_embeds: Embeddings of all documents. (list of numpy arrays representing document embeddings)
k: Number of most relevant documents to be selected. (integer)
relevancy_threshold: Threshold for relevancy. If the distance between the query and a document is smaller than relevancy_threshold, the document is selected. (float)
log: A dictionary that can be used to record any additional information you desire. (dictionary)

def cust(query_embeds, docs_embeds, k:int, relevancy_threshold:float, log:dict):
    from scipy.spatial.distance import euclidean
    import numpy as np
    distance = [[euclidean(query_embeds, docs_embeds[idx]),idx] for idx in range(len(docs_embeds))]
    distance = sorted(distance, key=lambda x: x[0])
    ## change dist if embeddings not between 0~1
    max_dist = 1
    while max_dist < distance[-1][0]:
        max_dist *= 10
        relevancy_threshold *= 10
    ## add log para
    log['dd'] = "miao"
    return  [idx for dist,idx in distance[:k] if (max_dist - dist) >= relevancy_threshold]

doc_path = "./mic/"
prompt = "五軸是什麼?"

qa = akasha.Doc_QA(verbose=True, search_type = cust, embeddings="hf:shibing624/text2vec-base-chinese")
qa.get_response(doc_path= doc_path, prompt = prompt)

Custom Embeddings

If you want to use other embeddings, you can put your own embeddings as a function and set as the parameter of embeddings.

For example, In the 'test_embed' function, we use the SentenceTransformer model to generate embeddings for the given texts. You can directly use 'test_embed' as a parameter for embeddings and execute the 'get_response' function.

Here's a breakdown of the parameters:
texts: A list of texts to be embedded.

def test_embed(texts:list)->list:

    from sentence_transformers import SentenceTransformer
    mdl = SentenceTransformer('BAAI/bge-large-zh-v1.5')
    embeds =  mdl.encode(texts,normalize_embeddings=True)

    return embeds

doc_path = "./mic/"
prompt = "五軸是什麼?"

qa = akasha.Doc_QA(verbose=True, search_type = "svm", embeddings = test_embed)
qa.get_response(doc_path= doc_path, prompt = prompt)

Custom Model

If you want to use other language models, you can put your own model as a function and set as the parameter of model.

For example, In the 'test_model' function, we use the OpenAI model to generate response for the given prompt. You can directly use 'test_model' as a parameter for model and execute the 'get_response' function.

Here's a breakdown of the parameters:
prompt: A string representing the prompt for the language model.

def test_model(prompt:str):
    import openai
    from langchain.chat_models import ChatOpenAI
    openai.api_type = "open_ai"
    model = ChatOpenAI(model="gpt-3.5-turbo", temperature = 0)
    ret = model.predict(prompt)
    return ret

doc_path = "./mic/"
prompt = "五軸是什麼?"

qa = akasha.Doc_QA(verbose=True, search_type = "svm", model = test_model)
qa.get_response(doc_path= doc_path, prompt = prompt)

Stream Output

If you want stream output to your web pages or API, for openai, huggingface, remote models, you can use to get the generator of LLM response. Below is the example of using streamlit write_stream shows response.

import streamlit as st
import akasha
import gc, torch

if "pre" not in st.session_state:
    st.session_state.pre = ""
if "model_obj" not in st.session_state:
    st.session_state.model_obj = None
def clean():

def stream_response(prompt:str, model_name:str="openai:gpt-3.5-turbo"):
    # Mistral-7B-Instruct-v0.3   Llama3-8B-Chinese-Chat
    mdl_type = model_name.split(':')[0]
    streaming =
    for s in streaming:
        if mdl_type == "openai":
            yield s.content
            yield s

model = st.selectbox("select model", ["openai:gpt-3.5-turbo","hf:model/Mistral-7B-Instruct-v0.3"])
prompt = st.chat_input("Say something")
if st.session_state.pre != model:
    st.session_state.model_obj = None
    st.session_state.model_obj = akasha.helper.handle_model(model, False, 0.0)
    st.session_state.pre = model

if prompt:
    st.write("question: " + prompt)
    st.write_stream(stream_response(prompt, model))

Command Line Interface

You can also use akasha in command line, for example, you can use keep-responsing to create a document QA model and keep asking different questions and get response based on the documents in the given -d directory.

$ akasha keep-responsing -d ../doc/plc/  -c 400 -k 1
Please input your question(type "exit()" to quit) : 應回收廢塑膠容器材質種類不包含哪種?  聚丙烯(PP) 聚苯乙烯(PS) 聚氯乙烯(PVC)  低密度聚乙烯(LDPE)
Response:  應回收廢塑膠容器材質種類不包含低密度聚乙烯(LDPE)。

Please input your question(type "exit()" to quit) : 所謂市盈率,是指每股市價除以每股盈餘,也就是股票的?   本益比  帳面值比  派息   資金

Please input your question(type "exit()" to quit) : exit()

Currently you can use get-response, keep-responsing, chain-of-thought and auto_create_questionset and auto_evaluation.

$ akasha keep-responsing --help
Usage: akasha keep-responsing [OPTIONS]

  -d, --doc_path TEXT         document directory path, parse all .txt, .pdf,
                              .docx files in the directory  [required]
  -e, --embeddings TEXT       embeddings for storing the documents
  -c, --chunk_size INTEGER    chunk size for storing the documents
  -m, --model TEXT            llm model for generating the response
  -ur --use_rerank BOOL       use rerank to sort the documents
  -t, --threshold FLOAT       threshold score for selecting the relevant
  -l, --language TEXT         language for the documents, default is 'ch' for
  -s, --search_type TEXT      search type for the documents, include merge,
                              svm, mmr, tfidf
  -sys, --system_prompt TEXT  system prompt for the llm model
  -md, --max_doc_len INTEGER    max document length for the llm model input
  --help                      Show this message and exit.


If you prefer running Akasha via a web page, we offer a Streamlit-based user interface.

To start the application, use the following command:

$ akasha ui

You should now be able to access the web page at http://localhost:8501/.

You can start by going to the Settings page to configure your settings.

The first option, Document Path , specifies the directory where you want the LLM to search for documents.

You can either add document files and name the directory from the Upload Files page or place the directory containing documents in the ./docs/ directory.

image image

You can download the models you want into model/ directory, and they will be added to Langauage Model option in the Setting page.


The default setting is to use the OpenAI model and embeddings, so please remember to add your OpenAI API key on the left side.


After you have finished setting up, you can start using Akasha.

For example, you can instruct the language model with a query like '五軸是什麼,' and you can include a system prompt to specify how you want the model to answer in Chinese.

It's important to note that the difference between a prompt and a system prompt is that the system prompt is not used for searching similar documents; it's more about defining the format or type of response you expect from the language model for a given prompt question.


