A file based database using yaml as file format
Project description
YamlDB
YamlDB is a lightweight, file-based database that uses YAML for storage. It provides a simple API for managing nested configuration data with support for atomic writes, concurrency locking, and advanced querying.
Features
- Nested Key Access: Use dot-notation (e.g.,
user.profile.name) to get or set values. - Atomic Writes: Ensures data integrity by writing to a temporary file before replacing the original.
- Concurrency Locking: Uses system-level advisory locks (
portalocker) to prevent data corruption during concurrent access. - Comment Preservation: Powered by
ruamel.yaml, it preserves comments and formatting in your YAML files. - Write Optimization: An
auto_flushmechanism and_dirtyflag reduce unnecessary disk I/O. - Advanced Querying: Integrated JMESPath support for complex searches.
- Type Casting: Hybrid system for explicit casting during storage and retrieval.
- Transactions: Atomic bulk updates with full rollback support.
- CLI Tool: Manage your YAML databases directly from the terminal.
Installation
Standard Installation
pip install .
Installation with Encryption Support
To use the :encrypt: backend, you need the cryptography library:
pip install ".[encrypt]"
Quick Start
Programmatic API (Computing Infrastructure Example)
YamlDB is ideal for managing infrastructure manifests, cluster configurations, and node metadata.
from yamldb import YamlDB
# Initialize DB for cluster configuration
db = YamlDB(filename="cluster_config.yml", auto_flush=True)
# Define infrastructure components using dot-notation
db.set("cluster.name", "hpc-cluster-01")
db.set("cluster.nodes.node01.gpu_count", "8", cast=int)
db.set("cluster.nodes.node01.status", "online")
db.set("cluster.nodes.node02.gpu_count", "4", cast=int)
db.set("cluster.nodes.node02.status", "maintenance")
# Retrieve infrastructure details
gpu_count = db.get_as("cluster.nodes.node01.gpu_count", int)
status = db.get("cluster.nodes.node01.status")
# Advanced Search (JMESPath)
# Find all nodes that are currently 'online'
online_nodes = db.search("cluster.nodes.[?status=='online']")
# Bulk Updates in a Transaction (e.g., updating cluster version)
with db.transaction():
db.set("cluster.version", "2.4.1")
db.set("cluster.last_updated", "2026-04-27")
# If an exception occurs, the version won't be partially updated
CLI Usage
The yamldb CLI provides a powerful way to interact with your YAML databases directly from the terminal.
General Usage
yamldb [OPTIONS] COMMAND [ARGS]...
Commands
get: Retrieve a value using dot-notation.
yamldb get <file> <key>
# Example: yamldb get config.yml user.profile.name
set: Set a value. Automatically creates parent keys if they don't exist.
yamldb set <file> <key> <value>
# Example: yamldb set config.yml app.version 1.2.0
delete: Remove a key from the database.
yamldb delete <file> <key>
# Example: yamldb delete config.yml user.old_setting
search: Query the database using JMESPath expressions.
yamldb search <file> <query>
# Example: yamldb search config.yml "[?status=='active']"
stats: Display write efficiency and I/O statistics.
yamldb stats <file>
# Example: yamldb stats config.yml
Advanced API Reference
items_recursive()
A generator that yields all leaf nodes in the database as (dot_notation_key, value) pairs. Useful for auditing entire infrastructure states.
for key, value in db.items_recursive():
print(f"{key}: {value}")
# Output: cluster.nodes.node01.gpu_count: 8 ...
find_all(value) & filter(predicate)
Quickly locate infrastructure components based on their state.
# Find all nodes that are in 'maintenance' mode
maintenance_nodes = db.find_all("maintenance")
# Find all nodes with more than 4 GPUs
high_capacity_nodes = db.filter(lambda v: isinstance(v, int) and v > 4)
update_many(data_dict)
Perform multiple infrastructure updates atomically.
db.update_many({
"cluster.nodes.node01.status": "offline",
"cluster.nodes.node01.last_reboot": "2026-04-27",
"cluster.global.maintenance_mode": True
})
Wildcard Retrieval
You can use the * wildcard in get() or via bracket access to retrieve multiple values across the database. This is powered by JMESPath under the hood.
# Get the status of ALL nodes in the cluster
# Returns a list: ['online', 'maintenance', 'online']
statuses = db.get("cluster.nodes.*.status")
# Get the GPU count for all nodes
# Returns a list: [8, 4, 16]
gpu_counts = db["cluster.nodes.*.gpu_count"]
Write Efficiency (get_stats)
Track how many disk writes were avoided thanks to the _dirty flag.
stats = db.get_stats()
print(f"Write Efficiency: {stats['write_efficiency']}")
Configuration
filename: Path to the YAML file.backend::file:(default): Standard human-readable YAML storage.:memory:: In-memory storage (no disk I/O).:binary:: High-performance binary storage using JSON serialization.
auto_flush: IfTrue(default), changes are written to disk immediately unless inside a transaction.
Advanced Features
Binary Storage
For applications requiring high performance and smaller file sizes, use the :binary: backend.
db = YamlDB(filename="data.bin", backend=":binary:")
db.set("metrics.cpu", 45)
# Export binary data to human-readable YAML for debugging
db.convert_to_yaml("debug_export.yml")
Secure Storage (Encryption)
For sensitive data, use the :encrypt: backend. This encrypts the entire database file (including keys and structure) using AES-128 symmetric encryption.
Note: The
:encrypt:backend is currently experimental. We are actively refining its implementation and would greatly appreciate your feedback!
# Initialize an encrypted database
db = YamlDB(
filename="secrets.enc",
backend=":encrypt:",
password="your-strong-password"
)
# Use it exactly like a normal YamlDB
db.set("cluster.admin_password", "super-secret-123")
db.set("cluster.api_key", "abc-123-def-456")
# The file 'secrets.enc' is now a binary blob that is unreadable
# without the correct password.
Web UI Prototype
YamlDB comes with a lightweight Web UI for visual data management.
To run the Web UI:
- Install dependencies:
pip install fastapi uvicorn - Run the server:
python yamldb/bin/run_webui.py - Open your browser to
http://localhost:8000
The Web UI allows you to browse the database tree, set/delete values via dot-notation, and monitor write efficiency in real-time.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file yamldb-2.0.0.tar.gz.
File metadata
- Download URL: yamldb-2.0.0.tar.gz
- Upload date:
- Size: 29.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5f707e170a7a3e076265828953e2902a0ec7ec0f2b1af229a84f30ebea83821
|
|
| MD5 |
a70bccea7423c496d6c4cc65fadddf47
|
|
| BLAKE2b-256 |
69ea0c1d93cf3600531645e34a2cbc3ce06155547c13504c2753f71bada1d6d9
|
File details
Details for the file yamldb-2.0.0-py2.py3-none-any.whl.
File metadata
- Download URL: yamldb-2.0.0-py2.py3-none-any.whl
- Upload date:
- Size: 16.1 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9f6cfb220593dd6beb2889d28c8ab30037a6c4ef496099ac6d03543f3dc91f5
|
|
| MD5 |
71e9e07a93422cf4af5b9a62e6ed7cff
|
|
| BLAKE2b-256 |
a37223e360ffe52dd4118f9d00a80a2908146da9c5e82d53060d83bb583e45ed
|