No project description provided
Project description
OneContext
Table of Contents
LLM Context as a Service
OneContext makes it really easy and fast to augment your LLM application with your own data
in a few API calls. Upload your data to a Knowledge Base
and directly it
query with natural languge to retrieve relevant context for your LLM application.
We manage the full document processing and retrieval pipeline so that you don't have to:
- document ingestion, chunking and cleaning
- effcient vector embeddings at scale using state of the art open source models
- low latency multi stage query pipeline to provide the most relevant context for your LLM application
We keep up with the latest research to provide an accurate and fast retrieval pipeline based on model evalution and best practice heuristics.
Multi stage query pipeline out of the box:
- Fast base-model retrieves a large pool of documents
- Cross-encoder reranks the retrieved documents to provide the precise results relevant to the query.
Use Cases:
- Question Answering over a large knowledge base
- Long term memorry for chatbots
- Runtime context for instruction following agents
- Prevent and detect hallucinations based on custom data
Quick Start
pip install onecontext
Configuration
export ONECONTEXT_API_KEY="YOUR_API_KEY"
You can get an api key by joining our closed beta. Email Ross at ross@onecontext.ai to get on the list.
Usage
Create a Knowledge Base:
from onecontext import KnowledgeBase
my_knowledge_base = KnowledgeBase("my_knowledge_base")
my_knowledge_base.create()
List Knowledge Bases
from onecontext import list_knowledge_bases
print(list_knowledge_bases())
Upload files to the Knowledge Base:
You can upload an entire directory like this:
my_kb = KnowledgeBase("my_knowledge_base")
directory = "/path/to/local_directory"
my_kb.upload_from_directory(directory)
Or, you can upload an individual file like this:
my_kb = KnowledgeBase("my_knowledge_base")
my_kb.upload_file(
"/path/to/local_file.pdf"
)
If you like, you can also add metadata to your files. This makes it really easy to filter your query-space later on. Metadata can be any key-value pairs, passed as a dictionary. For example:
my_kb = KnowledgeBase("my_knowledge_base")
my_kb.upload_file(
"/path/to/local_file.pdf", metadata={"ContainsPII": True, "author": "ross", "description": "passport", "file-type": "scan", "category": "personal"}
)
Currently, you can upload any of [.pdf, .docx, .txt] files. Don't worry if the PDF is a scan (and doesn't have easily extractable text), OneContext will figure it out via OCR. In the near future you'll be able to upload video, audio, and connect to multiple file-storage platforms.
Once the files have been uploaded they will be processed, chunked and embedded by OneContext.
Check sync status:
print(my_knowledge_base.is_synced)
Query the Knowledge Base
from onecontext import Retriever
retriever = Retriever(knowledge_bases=[my_kb])
documents = retriever.query("what is onecontext?", output_k=20)
And, filtering by metadata:
from onecontext import Retriever
retriever = Retriever(knowledge_bases=[my_kb])
documents = retriever.query("what is onecontext?", output_k=20, metadata_filters={"ContainsPII": True, "author": "ross"})
By default the query pipeline is composed of two steps:
- Retrieval: fetch the larger pool of documents (rerank_pool_size)
- Re-ranking: re-rank the results with a downstream model to get most relevant documents
To improve recall you can increase the rerank_pool_size:
documents = retriever.query("what is onecontext?", output_k=10, rerank_pool_size=80)
You can also skip the re-ranking step entirely if you want to prioritise speed over accuracy of results.
documents = retriever.query_no_rerank("what is onecontext?")
License
onecontext
is distributed under the terms of the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for onecontext-0.0.10-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d3364552ac0eb3dc6a0cbca1e879b383ece6c6cd82919c23351939e218b953b |
|
MD5 | 8fd72ddd7a7f111044266b1e4a18bc1c |
|
BLAKE2b-256 | 9b3bf09d25e14af6bf5719e24251623a6fd7cc203f7787619439e8d036edd4d3 |