Skip to main content

Tiny unofficial ChromaDB operations CLI to help you manage your ChromaDB instance.

Project description

ChromaDB Operations Tools

Tiny collection of utilities to help you managed ChromaDB indices.

WARNING: These tools rely on internal ChromaDB APIs and may break in the future.

☠️☠️☠️ BEFORE YOU BEGIN ☠️☠️☠️

Before you use these tools make sure your ChromaDB persistent dir, on which you intend to run these tools, is backed up.

Installation

pip install chromadb-ops

Usage

Info

Gather general information about your persistent Chroma instance. This command is useful to understand what's going on internally in Chroma and to get recommendations or support from the team by providing the output.

chops info /path/to/persist_dir

Supported options are:

  • --skip-collection-names (-s) - to skip specific collections
  • --privacy-mode (-p) - privacy mode hides paths and collection names so that the output can be shared without exposing sensitive information

When sharing larger outputs consider storing the output in a file:

chops info /path/to/persist_dir -p > chroma_info.txt

Sample output:

                                 General Info
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                    Property ┃ Value                                          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│              Chroma Version │ 0.5.5                                          │
│        Number of Collection │ 1                                              │
│           Persist Directory │ /tmp/tmp9l3ceuvp                               │
│      Persist Directory Size │ 142.2MiB                                       │
│              SystemDB size: │ 81.6MiB (/tmp/tmp9l3ceuvp/chroma.sqlite3)      │
│     Orphan HNSW Directories │ []                                             │
└─────────────────────────────┴────────────────────────────────────────────────┘
───────────────────────────────── Collections ──────────────────────────────────
───────────────────────────────────── test ─────────────────────────────────────
                             'test' Collection Data
┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃         Table Data ┃ Value                                                   ┃
┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│                 ID │ 9e80e4fd-fd4b-47b8-810c-e8ffa57c1912                    │
│               Name │ test                                                    │
│           Metadata │ None                                                    │
│          Dimension │ 1536                                                    │
│             Tenant │ default_tenant                                          │
│           Database │ default_database                                        │
│            Records │ 10,000                                                  │
│        WAL Entries │ 10,000                                                  │
└────────────────────┴─────────────────────────────────────────────────────────┘
─────────────────────────────────── Segments ───────────────────────────────────
                            Metadata Segment (test)
┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                Property ┃ Value                                              ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│              Segment ID │ 832fa2cd-6c40-4eee-ad7d-35f260acaaaa               │
│                    Type │ urn:chroma:segment/metadata/sqlite                 │
│                   Scope │ METADATA                                           │
│        SysDB Max Seq ID │ 10,000                                             │
└─────────────────────────┴────────────────────────────────────────────────────┘
                              HNSW Segment (test)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                     Property ┃ Value                                         ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│                   Segment ID │ 13609103-d317-4556-a744-008c96229b72          │
│                         Type │ urn:chroma:segment/vector/hnsw-local-persist… │
│                        Scope │ VECTOR                                        │
│                         Path │ /tmp/tmp9l3ceuvp/13609103-d317-4556-a744-008… │
│             SysDB Max Seq ID │ 0                                             │
│                HNSW Dir Size │ 60.6MiB                                       │
│     HNSW Metadata Max Seq ID │ 10,000                                        │
│   HNSW Metadata Total Labels │ 10,000                                        │
│                      WAL Gap │ 0                                             │
│ HNSW Raw Total Active Labels │ 10,000                                        │
│    HNSW Raw Allocated Labels │ 10,000                                        │
│           HNSW Orphan Labels │ set()                                         │
│          Fragmentation Level │ 0.0                                           │
└──────────────────────────────┴───────────────────────────────────────────────┘

⚠️ Interesting things to look for:

  • Fragmentation Level - the higher the value the more unnecessary memory and performance hits your HNSW index suffers. It needs to be rebuilt.
  • Orphan HNSW Directories - these are directories that are not associated with any collection. They can be safely deleted.
  • WAL Entries - high values usually means that you need prune your WAL. Use either this tool or the official Chroma CLI.
  • HNSW Orphan Labels - this must always be empty set, if you see anything else report it in Discord.

WAL Commit

This command ensures your WAL is committed to binary vector index (HNSW).

chops commit-wal /path/to/persist_dir

Note: You can skip certain collections by running chops commit-wal /path/to/persist_dir --skip <collection_name>

WAL Cleanup

This command cleans up the committed portion of the WAL and VACUUMs the database.

chops clean-wal /path/to/persist_dir

WAL Export

This commands exports the WAL to a jsonl file. The command can be useful in taking backups of the WAL.

chops export-wal /path/to/persist_dir --out /path/to/export.jsonl

Note: If --out or -o is not specified the command will print the output to stdout.

Full-Text Search Index Rebuild

This command rebuilds the full-text search index.

Note: Why is this needed? Users have reported broken FTS indices that result in a error of this kind: no such table: embedding_fulltext_search

chops rebuild-fts /path/to/persist_dir

Using Docker

Note: You have to mount your persist directory into the container for the commands to work.

Building the image:

docker build -t chops .

WAL Commit

docker run -it --rm -v ./persist_dir:/chroma-data ghcr.io/amikos-tech/chromadb-ops/chops:latest commit-wal /chroma-data

WAL Cleanup

docker run -it --rm -v ./persist_dir:/chroma-data ghcr.io/amikos-tech/chromadb-ops/chops:latest clean-wal /chroma-data

WAL Export

docker run -it --rm -v ./persist_dir:/chroma-data -v ./backup:/backup ghcr.io/amikos-tech/chromadb-ops/chops:latest export-wal /chroma-data --out /backup/export.jsonl

Full-Text Search Index Rebuild

docker run -it --rm -v ./persist_dir:/chroma-data ghcr.io/amikos-tech/chromadb-ops/chops:latest rebuild-fts /chroma-data

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chromadb_ops-0.0.8.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

chromadb_ops-0.0.8-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file chromadb_ops-0.0.8.tar.gz.

File metadata

  • Download URL: chromadb_ops-0.0.8.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.9.19 Linux/6.5.0-1025-azure

File hashes

Hashes for chromadb_ops-0.0.8.tar.gz
Algorithm Hash digest
SHA256 2dab3d4215d6ca3c70a13d410ad6ca9474c68a824658a670df5285edf4bcc0bc
MD5 6ad85d3347aad1a8cf222f028b2d4f0d
BLAKE2b-256 09ea698267973a333ba7dda10992ff589afaa745f24b035a9e9cde9715ef4776

See more details on using hashes here.

File details

Details for the file chromadb_ops-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: chromadb_ops-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 15.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.9.19 Linux/6.5.0-1025-azure

File hashes

Hashes for chromadb_ops-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 b5c9d7a65ad0dfa6a1466eaca48d7c5911778d7d1d014d4bb215bf0dbf1f2f58
MD5 6e352c8d1359a7293bcfe8c6f9db6f3e
BLAKE2b-256 278c8c9c0f38ff101870d0738216c7cc6e34ee7434c5eb0fef8f33841859bc1a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page