Skip to main content

Document ingestion and processing package with vector store and graph capabilities

Project description

Cognify

A Python package for document ingestion and processing with vector store capabilities.

Installation

pip install cognify

Features

  • Document processing from URLs (PDF, DOC, etc.)
  • Vector storage with Qdrant
  • Bulk document upload
  • Document chunking and metadata management
  • OpenAI embeddings integration

Usage

Basic Setup

from cognify import VectorStore

# Initialize vector store
vector_store = VectorStore(qdrant_url="http://localhost:6333")

Create Collection

# Create a new collection
vector_store.create_collection("my_documents")

Bulk Upload Documents

# Upload multiple documents from URLs
urls = [
    "https://example.com/document1.pdf",
    "https://example.com/document2.pdf"
]

results = vector_store.bulk_url_upload("my_documents", urls)
print(f"Uploaded {results['successful_uploads']} documents")

Retrieve Document Chunks

# Get all chunks for a specific document
chunks = vector_store.get_document_chunks("my_documents", "document_id")

Delete Documents

# Delete specific document chunks
chunk_ids = ["chunk_id1", "chunk_id2"]
vector_store.delete_documents("my_documents", chunk_ids)

Requirements

  • Python 3.12+
  • Qdrant vector database
  • OpenAI API key (for embeddings)

Dependencies

  • langchain
  • langchain-openai
  • langchain-qdrant
  • qdrant-client
  • docling
  • python-dotenv
  • openai
  • tiktoken

Environment Variables

Set your OpenAI API key:

export OPENAI_API_KEY="your-api-key-here"

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zencognify-0.1.0.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zencognify-0.1.0-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file zencognify-0.1.0.tar.gz.

File metadata

  • Download URL: zencognify-0.1.0.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for zencognify-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ce709763e582f497fdbeceeabbb49fc9bd2d3e69e654a36f8aaeca24cd0d07ba
MD5 88d93e9fbf1db767175dfc0700096b9c
BLAKE2b-256 e9bcfa9339e9f732fe031880d933e58b64c25aec6a160e501341e2602448ee2a

See more details on using hashes here.

File details

Details for the file zencognify-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: zencognify-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for zencognify-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e5e0b76c7ad5165bd39befaa04aa8ae1db13ea8d531c99ece70a3505669baa4a
MD5 31bc774b9fadd096b58de8bf2e1e7738
BLAKE2b-256 017e5174d6539be4b334d7aec8b10b041e7e1358af88c88335e30b18eb1179f6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page