Skip to main content

Document ingestion and processing package with vector store and graph capabilities

Project description

Zencognify

A Python package for document ingestion and processing with vector store capabilities.

Installation

pip install zencognify

Features

  • Document processing from URLs (PDF, DOC, etc.)
  • Vector storage with Qdrant
  • Bulk document upload
  • Document chunking and metadata management
  • OpenAI embeddings integration

Usage

Basic Setup

from cognify import VectorStore

vector_store = VectorStore(qdrant_url="http://localhost:6333")

Create Collection

vector_store.create_collection("my_documents")

Bulk Upload Documents

urls = [
    "https://example.com/document1.pdf",
    "https://example.com/document2.pdf"
]

results = vector_store.bulk_url_upload("my_documents", urls)
print(f"Uploaded {results['successful_uploads']} documents")

Retrieve Document Chunks

chunks = vector_store.get_document_chunks("my_documents", "document_id")

Semantic Search

results = vector_store.search("my_documents", "machine learning algorithms", top_k=5)
for doc, score in results:
    print(f"Score: {score:.3f}")
    print(f"Content: {doc.page_content[:200]}...")
    print(f"Source: {doc.metadata['source']}")
    print("---")

document_id = "your-document-id"
results = vector_store.search_by_document("my_documents", "neural networks", document_id, top_k=3)

Delete Documents

chunk_ids = ["chunk_id1", "chunk_id2"]
vector_store.delete_documents("my_documents", chunk_ids)

Requirements

  • Python 3.12+
  • Qdrant vector database
  • OpenAI API key (for embeddings)

Dependencies

  • langchain
  • langchain-openai
  • langchain-qdrant
  • qdrant-client
  • docling
  • python-dotenv
  • openai
  • tiktoken

Environment Variables

Set your OpenAI API key:

export OPENAI_API_KEY="your-api-key-here"

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zencognify-0.1.2.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zencognify-0.1.2-py3-none-any.whl (4.7 kB view details)

Uploaded Python 3

File details

Details for the file zencognify-0.1.2.tar.gz.

File metadata

  • Download URL: zencognify-0.1.2.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for zencognify-0.1.2.tar.gz
Algorithm Hash digest
SHA256 89c9f1c31bbaaecb55de3d44e01daf15d020ebf4673207f81230c47c6155e1b6
MD5 5b330df2ef15c99848b78b2e848b5e0c
BLAKE2b-256 24e9acdc65f4cb254c7fb530c3acbc5fa8b503a2c5c85d4a980f0644656342ba

See more details on using hashes here.

File details

Details for the file zencognify-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: zencognify-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 4.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for zencognify-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4ff2a93072a64a9cf50a702a6232c335a5295e3546e83426c7e66769d29983a4
MD5 f98a700da6db8833f0a8ba036c2b5660
BLAKE2b-256 0641e97a0d9e9ca0b867135b05286893abbbc09fc3e64efef5fa92a9f049d47a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page