Skip to main content

Command-line tools for managing document stores like Qdrant and Solr

Project description

docstore-manager

A general-purpose command-line tool for managing document store databases, currently supporting Qdrant vector database and Solr search platform. Simplifies common document store management tasks through a unified CLI interface.

Features

  • Multi-platform Support:
    • Qdrant vector database for similarity search and vector operations
    • Solr search platform for text search and faceted navigation
  • Collection Management:
    • Create, delete, and list collections
    • Get detailed information about collections
  • Document Operations:
    • Add/update documents to collections
    • Remove documents from collections
    • Retrieve documents by ID
  • Search Capabilities:
    • Vector similarity search (Qdrant)
    • Full-text search (Solr)
    • Filtering and faceting
  • Batch Operations:
    • Add fields to documents
    • Delete fields from documents
    • Replace fields in documents
  • Advanced Features:
    • Support for JSON path selectors for precise document modifications
    • Multiple configuration profiles support
    • Flexible output formatting (JSON, YAML, CSV)

Installation

# From PyPI
pipx install docstore-manager

# From source
git clone https://github.com/allenday/docstore-manager.git
cd docstore-manager
pipx install -e .

Configuration

When first run, docstore-manager will create a configuration file at:

  • Linux/macOS: ~/.config/docstore-manager/config.yaml
  • Windows: %APPDATA%\docstore-manager\config.yaml

You can edit this file to add your connection details and schema configuration:

default:
  # Common settings for all document stores
  connection:
    type: qdrant  # or solr
    collection: my-collection

  # Qdrant-specific settings
  qdrant:
    url: localhost
    port: 6333
    api_key: ""
    vectors:
      size: 256
      distance: cosine
      indexing_threshold: 0
    payload_indices:
      - field: category
        type: keyword
      - field: created_at
        type: datetime
      - field: price
        type: float

  # Solr-specific settings
  solr:
    url: http://localhost:8983/solr
    username: ""
    password: ""
    schema:
      fields:
        - name: id
          type: string
        - name: title
          type: text_general
        - name: content
          type: text_general
        - name: category
          type: string
        - name: created_at
          type: pdate

production:
  connection:
    type: qdrant
    collection: production-collection

  qdrant:
    url: your-production-instance.region.cloud.qdrant.io
    port: 6333
    api_key: your-production-api-key
    vectors:
      size: 1536  # For OpenAI embeddings
      distance: cosine
      indexing_threshold: 1000
    payload_indices:
      - field: product_id
        type: keyword
      - field: timestamp
        type: datetime

  solr:
    url: https://your-production-solr.example.com/solr
    username: admin
    password: your-production-password

Each profile can define its own:

  • Connection settings for both Qdrant and Solr
  • Vector configuration for Qdrant (size, distance metric, indexing behavior)
  • Schema configuration for Solr
  • Payload indices for optimized search performance

The YAML format makes it easy to maintain a clean, organized configuration across multiple environments.

You can switch between profiles using the --profile flag:

docstore-manager --profile production list

You can also override any setting with command-line arguments.

Testing

This project uses pytest for testing. Tests are divided into two main categories:

  • Unit Tests: These tests verify individual components in isolation and do not require external services. They are fast and should be run frequently during development.
  • Integration Tests: These tests verify the interaction between the CLI tool and external services (Qdrant, Solr). They require these services to be running (e.g., via docker-compose up -d) and are marked with @pytest.mark.integration.

Running Tests:

  • Run only Unit Tests (Default Behavior):

    pytest -v
    

    (Integration tests are skipped by default)

  • Run only Integration Tests:

    # First, ensure Qdrant/Solr containers are running (e.g., docker-compose up -d)
    RUN_INTEGRATION_TESTS=true pytest -m integration -v
    

    (Requires setting the RUN_INTEGRATION_TESTS environment variable)

  • Run All Tests (Unit + Integration):

    # First, ensure Qdrant/Solr containers are running
    RUN_INTEGRATION_TESTS=true pytest -v
    

Usage

docstore-manager <document-store> <command> [options]

Document Stores:

  • qdrant: Commands for Qdrant vector database
  • solr: Commands for Solr search platform

Available Commands:

  • list: List all collections
  • create: Create a new collection
  • delete: Delete an existing collection
  • info: Get detailed information about a collection
  • add-documents: Add documents to a collection
  • remove-documents: Remove documents from a collection
  • get: Retrieve documents by ID
  • search: Search documents in a collection
  • scroll: Scroll through documents in a collection (Qdrant only)
  • count: Count documents in a collection (Qdrant only)
  • config: View available configuration profiles

Connection Options:

--profile PROFILE  Configuration profile to use
--url URL          Server URL
--port PORT        Server port (Qdrant only)
--api-key API_KEY  API key (Qdrant only)
--username USER    Username (Solr only)
--password PASS    Password (Solr only)
--collection NAME  Collection name

Examples:

Qdrant Examples:

# List all Qdrant collections
docstore-manager qdrant list

# Create a new Qdrant collection with custom settings
docstore-manager qdrant create --collection my-collection --size 1536 --distance euclid

# Get info about a Qdrant collection
docstore-manager qdrant info --collection my-collection

# Retrieve points by ID from Qdrant
docstore-manager qdrant get --ids "1,2,3" --with-vectors

# Search Qdrant using vector similarity
docstore-manager qdrant search --vector-file query_vector.json --limit 10

# Retrieve points using a filter and save as CSV
docstore-manager qdrant get --filter '{"key":"category","match":{"value":"product"}}' \
  --format csv --output results.csv

# Add a field to documents matching a filter
docstore-manager qdrant batch --filter '{"key":"category","match":{"value":"product"}}' \
  --add --doc '{"processed": true}'

# Delete a field from specific documents
docstore-manager qdrant batch --ids "doc1,doc2,doc3" --delete --selector "metadata.temp_data"

# Replace fields in documents from an ID file
docstore-manager qdrant batch --id-file my_ids.txt --replace --selector "metadata.source" \
  --doc '{"provider": "new-provider", "date": "2025-03-31"}'

Solr Examples:

# List all Solr collections
docstore-manager solr list

# Create a new Solr collection
docstore-manager solr create --collection my-collection

# Get info about a Solr collection
docstore-manager solr info --collection my-collection

# Add documents to Solr from a file
docstore-manager solr add-documents --collection my-collection --file documents.json

# Search documents in Solr
docstore-manager solr search --collection my-collection --query "title:example" --fields "id,title,score"

# Get documents by ID from Solr
docstore-manager solr get --collection my-collection --ids "doc1,doc2,doc3"

# Remove documents from Solr by query
docstore-manager solr remove-documents --collection my-collection --query "category:obsolete"

Switching Between Profiles:

# Use the production profile with Qdrant
docstore-manager --profile production qdrant list

# Use the production profile with Solr
docstore-manager --profile production solr list

Changelog

v0.1.0 (2025-05-03)

  • Initial release of docstore-manager
  • Support for both Qdrant and Solr document stores
  • Comprehensive usage examples for all operations
  • Improved error handling and logging
  • Standardized interfaces across document store implementations
  • Configuration profiles for different environments
  • Command-line interface for managing collections and documents
  • Detailed documentation and API reference
  • Renamed from "Qdrant Manager" to "docstore-manager"
  • Consolidated CLI entry points to a single docstore-manager command
  • Improved test coverage and reliability
  • Enhanced formatting options for command outputs
  • Fixed collection info formatting issues
  • Fixed CLI testing context handling
  • Fixed parameter validation in get_documents function
  • Fixed CollectionConfig validation

License

Apache-2.0

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docstore_manager-0.1.0.tar.gz (162.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docstore_manager-0.1.0-py3-none-any.whl (123.2 kB view details)

Uploaded Python 3

File details

Details for the file docstore_manager-0.1.0.tar.gz.

File metadata

  • Download URL: docstore_manager-0.1.0.tar.gz
  • Upload date:
  • Size: 162.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for docstore_manager-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0f129bcdc0bb0fd980974e4d4373491b6729ae4d9682e84fc5e50d96dd10858e
MD5 8e4f879b891d1ccf338b2a9a3ec8014c
BLAKE2b-256 e5a60699979bcfb83721bfd15288258f56443156fdc21d1b9e98e4dca6875209

See more details on using hashes here.

File details

Details for the file docstore_manager-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for docstore_manager-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ed4b17694e18a8b9365800fad28fc34ecc1b86cb34ccafd3eb8d8eff77a86365
MD5 2dd5e41e5cd5593e99a6bae5bf068e1b
BLAKE2b-256 2800791db74c5c9170212564a8d26ed1b5a815f1718a8c79d2da08844a4018e1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page