Skip to main content

Document ingestion and processing package with vector store and graph capabilities

Project description

Zencognify

A Python package for document ingestion and processing with vector store capabilities.

Installation

pip install zencognify

Features

  • Document processing from URLs (PDF, DOC, etc.)
  • Vector storage with Qdrant
  • Bulk document upload
  • Document chunking and metadata management
  • OpenAI embeddings integration

Usage

Basic Setup

from cognify import VectorStore

# Initialize vector store
vector_store = VectorStore(qdrant_url="http://localhost:6333")

Create Collection

# Create a new collection
vector_store.create_collection("my_documents")

Bulk Upload Documents

# Upload multiple documents from URLs
urls = [
    "https://example.com/document1.pdf",
    "https://example.com/document2.pdf"
]

results = vector_store.bulk_url_upload("my_documents", urls)
print(f"Uploaded {results['successful_uploads']} documents")

Retrieve Document Chunks

# Get all chunks for a specific document
chunks = vector_store.get_document_chunks("my_documents", "document_id")

Delete Documents

# Delete specific document chunks
chunk_ids = ["chunk_id1", "chunk_id2"]
vector_store.delete_documents("my_documents", chunk_ids)

Requirements

  • Python 3.12+
  • Qdrant vector database
  • OpenAI API key (for embeddings)

Dependencies

  • langchain
  • langchain-openai
  • langchain-qdrant
  • qdrant-client
  • docling
  • python-dotenv
  • openai
  • tiktoken

Environment Variables

Set your OpenAI API key:

export OPENAI_API_KEY="your-api-key-here"

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zencognify-0.1.1.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zencognify-0.1.1-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file zencognify-0.1.1.tar.gz.

File metadata

  • Download URL: zencognify-0.1.1.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for zencognify-0.1.1.tar.gz
Algorithm Hash digest
SHA256 167890a2a9e8be3c69a6ff1cd60718f108ce4892c045aaa4a1edb36595ee94c9
MD5 f8e2b98b3be232c23ae9295108217828
BLAKE2b-256 5114894beb688f4fa25ed7bb9b4bbc2454f54fa0bb62522227c35de3ed94b460

See more details on using hashes here.

File details

Details for the file zencognify-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: zencognify-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 4.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for zencognify-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 df18495f4ff8bf252854686835e37d95eccdb27057c9f12ed39f1647a73f5e98
MD5 5b6c67921b2a25278cc06080ed65c0fe
BLAKE2b-256 b5facadd512ae71d79bb7d32807924b8765e934451eea327778bb543610e5dda

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page