Skip to main content

A file based database using yaml as file format

Project description

YamlDB

YamlDB is a lightweight, file-based database that uses YAML for storage. It provides a simple API for managing nested configuration data with support for atomic writes, concurrency locking, and advanced querying.

Features

  • Nested Key Access: Use dot-notation (e.g., user.profile.name) to get or set values.
  • Atomic Writes: Ensures data integrity by writing to a temporary file before replacing the original.
  • Concurrency Locking: Uses system-level advisory locks (portalocker) to prevent data corruption during concurrent access.
  • Comment Preservation: Powered by ruamel.yaml, it preserves comments and formatting in your YAML files.
  • Write Optimization: An auto_flush mechanism and _dirty flag reduce unnecessary disk I/O.
  • Advanced Querying: Integrated JMESPath support for complex searches.
  • Type Casting: Hybrid system for explicit casting during storage and retrieval.
  • Transactions: Atomic bulk updates with full rollback support.
  • CLI Tool: Manage your YAML databases directly from the terminal.

Installation

Standard Installation

pip install .

Installation with Encryption Support

To use the :encrypt: backend, you need the cryptography library:

pip install ".[encrypt]"

Quick Start

Programmatic API (Computing Infrastructure Example)

YamlDB is ideal for managing infrastructure manifests, cluster configurations, and node metadata.

from yamldb import YamlDB

# Initialize DB for cluster configuration
db = YamlDB(filename="cluster_config.yml", auto_flush=True)

# Define infrastructure components using dot-notation
db.set("cluster.name", "hpc-cluster-01")
db.set("cluster.nodes.node01.gpu_count", "8", cast=int)
db.set("cluster.nodes.node01.status", "online")
db.set("cluster.nodes.node02.gpu_count", "4", cast=int)
db.set("cluster.nodes.node02.status", "maintenance")

# Retrieve infrastructure details
gpu_count = db.get_as("cluster.nodes.node01.gpu_count", int)
status = db.get("cluster.nodes.node01.status")

# Advanced Search (JMESPath)
# Find all nodes that are currently 'online'
online_nodes = db.search("cluster.nodes.[?status=='online']")

# Bulk Updates in a Transaction (e.g., updating cluster version)
with db.transaction():
    db.set("cluster.version", "2.4.1")
    db.set("cluster.last_updated", "2026-04-27")
    # If an exception occurs, the version won't be partially updated

CLI Usage

The yamldb CLI provides a powerful way to interact with your YAML databases directly from the terminal.

General Usage

yamldb [OPTIONS] COMMAND [ARGS]...

Commands

get: Retrieve a value using dot-notation.

yamldb get <file> <key>
# Example: yamldb get config.yml user.profile.name

set: Set a value. Automatically creates parent keys if they don't exist.

yamldb set <file> <key> <value>
# Example: yamldb set config.yml app.version 1.2.0

delete: Remove a key from the database.

yamldb delete <file> <key>
# Example: yamldb delete config.yml user.old_setting

search: Query the database using JMESPath expressions.

yamldb search <file> <query>
# Example: yamldb search config.yml "[?status=='active']"

stats: Display write efficiency and I/O statistics.

yamldb stats <file>
# Example: yamldb stats config.yml

Advanced API Reference

items_recursive()

A generator that yields all leaf nodes in the database as (dot_notation_key, value) pairs. Useful for auditing entire infrastructure states.

for key, value in db.items_recursive():
    print(f"{key}: {value}")
# Output: cluster.nodes.node01.gpu_count: 8 ...

find_all(value) & filter(predicate)

Quickly locate infrastructure components based on their state.

# Find all nodes that are in 'maintenance' mode
maintenance_nodes = db.find_all("maintenance")

# Find all nodes with more than 4 GPUs
high_capacity_nodes = db.filter(lambda v: isinstance(v, int) and v > 4)

update_many(data_dict)

Perform multiple infrastructure updates atomically.

db.update_many({
    "cluster.nodes.node01.status": "offline",
    "cluster.nodes.node01.last_reboot": "2026-04-27",
    "cluster.global.maintenance_mode": True
})

Wildcard Retrieval

You can use the * wildcard in get() or via bracket access to retrieve multiple values across the database. This is powered by JMESPath under the hood.

# Get the status of ALL nodes in the cluster
# Returns a list: ['online', 'maintenance', 'online']
statuses = db.get("cluster.nodes.*.status")

# Get the GPU count for all nodes
# Returns a list: [8, 4, 16]
gpu_counts = db["cluster.nodes.*.gpu_count"]

Write Efficiency (get_stats)

Track how many disk writes were avoided thanks to the _dirty flag.

stats = db.get_stats()
print(f"Write Efficiency: {stats['write_efficiency']}")

Configuration

  • filename: Path to the YAML file.
  • backend:
    • :file: (default): Standard human-readable YAML storage.
    • :memory:: In-memory storage (no disk I/O).
    • :binary:: High-performance binary storage using JSON serialization.
  • auto_flush: If True (default), changes are written to disk immediately unless inside a transaction.

Advanced Features

Binary Storage

For applications requiring high performance and smaller file sizes, use the :binary: backend.

db = YamlDB(filename="data.bin", backend=":binary:")
db.set("metrics.cpu", 45)

# Export binary data to human-readable YAML for debugging
db.convert_to_yaml("debug_export.yml")

Secure Storage (Encryption)

For sensitive data, use the :encrypt: backend. This encrypts the entire database file (including keys and structure) using AES-128 symmetric encryption.

Note: The :encrypt: backend is currently experimental. We are actively refining its implementation and would greatly appreciate your feedback!

# Initialize an encrypted database
db = YamlDB(
    filename="secrets.enc", 
    backend=":encrypt:", 
    password="your-strong-password"
)

# Use it exactly like a normal YamlDB
db.set("cluster.admin_password", "super-secret-123")
db.set("cluster.api_key", "abc-123-def-456")

# The file 'secrets.enc' is now a binary blob that is unreadable 
# without the correct password.

Web UI Prototype

YamlDB comes with a lightweight Web UI for visual data management.

To run the Web UI:

  1. Install dependencies: pip install fastapi uvicorn
  2. Run the server: python yamldb/bin/run_webui.py
  3. Open your browser to http://localhost:8000

The Web UI allows you to browse the database tree, set/delete values via dot-notation, and monitor write efficiency in real-time.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yamldb-2.0.0.tar.gz (29.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yamldb-2.0.0-py2.py3-none-any.whl (16.1 kB view details)

Uploaded Python 2Python 3

File details

Details for the file yamldb-2.0.0.tar.gz.

File metadata

  • Download URL: yamldb-2.0.0.tar.gz
  • Upload date:
  • Size: 29.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for yamldb-2.0.0.tar.gz
Algorithm Hash digest
SHA256 a5f707e170a7a3e076265828953e2902a0ec7ec0f2b1af229a84f30ebea83821
MD5 a70bccea7423c496d6c4cc65fadddf47
BLAKE2b-256 69ea0c1d93cf3600531645e34a2cbc3ce06155547c13504c2753f71bada1d6d9

See more details on using hashes here.

File details

Details for the file yamldb-2.0.0-py2.py3-none-any.whl.

File metadata

  • Download URL: yamldb-2.0.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 16.1 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for yamldb-2.0.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 a9f6cfb220593dd6beb2889d28c8ab30037a6c4ef496099ac6d03543f3dc91f5
MD5 71e9e07a93422cf4af5b9a62e6ed7cff
BLAKE2b-256 a37223e360ffe52dd4118f9d00a80a2908146da9c5e82d53060d83bb583e45ed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page