Document ingestion and processing package with vector store and graph capabilities
Project description
Zencognify
A Python package for document ingestion and processing with vector store capabilities.
Installation
pip install zencognify
Features
- Document processing from URLs (PDF, DOC, etc.)
- Vector storage with Qdrant
- Bulk document upload
- Document chunking and metadata management
- OpenAI embeddings integration
Usage
Basic Setup
from cognify import VectorStore
vector_store = VectorStore(qdrant_url="http://localhost:6333")
Create Collection
vector_store.create_collection("my_documents")
Bulk Upload Documents
urls = [
"https://example.com/document1.pdf",
"https://example.com/document2.pdf"
]
results = vector_store.bulk_url_upload("my_documents", urls)
print(f"Uploaded {results['successful_uploads']} documents")
Retrieve Document Chunks
chunks = vector_store.get_document_chunks("my_documents", "document_id")
Semantic Search
results = vector_store.search("my_documents", "machine learning algorithms", top_k=5)
for doc, score in results:
print(f"Score: {score:.3f}")
print(f"Content: {doc.page_content[:200]}...")
print(f"Source: {doc.metadata['source']}")
print("---")
document_id = "your-document-id"
results = vector_store.search_by_document("my_documents", "neural networks", document_id, top_k=3)
Delete Documents
chunk_ids = ["chunk_id1", "chunk_id2"]
vector_store.delete_documents("my_documents", chunk_ids)
Requirements
- Python 3.12+
- Qdrant vector database
- OpenAI API key (for embeddings)
Dependencies
- langchain
- langchain-openai
- langchain-qdrant
- qdrant-client
- docling
- python-dotenv
- openai
- tiktoken
Environment Variables
Set your OpenAI API key:
export OPENAI_API_KEY="your-api-key-here"
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
zencognify-0.1.2.tar.gz
(4.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zencognify-0.1.2.tar.gz.
File metadata
- Download URL: zencognify-0.1.2.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
89c9f1c31bbaaecb55de3d44e01daf15d020ebf4673207f81230c47c6155e1b6
|
|
| MD5 |
5b330df2ef15c99848b78b2e848b5e0c
|
|
| BLAKE2b-256 |
24e9acdc65f4cb254c7fb530c3acbc5fa8b503a2c5c85d4a980f0644656342ba
|
File details
Details for the file zencognify-0.1.2-py3-none-any.whl.
File metadata
- Download URL: zencognify-0.1.2-py3-none-any.whl
- Upload date:
- Size: 4.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ff2a93072a64a9cf50a702a6232c335a5295e3546e83426c7e66769d29983a4
|
|
| MD5 |
f98a700da6db8833f0a8ba036c2b5660
|
|
| BLAKE2b-256 |
0641e97a0d9e9ca0b867135b05286893abbbc09fc3e64efef5fa92a9f049d47a
|