A pure-Python, filesystem-based database using TSV files

These details have not been verified by PyPI

Project links

Project description

dbbasic-tsv: The Anti-Oracle Database

A blazing-fast TSV (Tab-Separated Values) database that achieves 600,000+ inserts/second with optional Rust acceleration, while keeping your data in human-readable text files.

Why This Exists

We built a YouTube-like video platform and discovered we didn't need PostgreSQL, Redis, or Elasticsearch. Just text files. This "toy" database now outperforms many production systems while maintaining radical simplicity.

Features

Core Philosophy

Human-Readable: All data stored in TSV files you can grep, cat, or edit in Excel
Git-Friendly: Text files mean perfect version control and diff viewing
Zero Setup: No server, no configuration, no migrations
Optional Speed: Pure Python works everywhere, Rust makes it blazing fast

Performance

Operation	Pure Python	With Rust	SQLite
Insert	163,000/sec	600,000/sec*	713,000/sec
Bulk Insert	163,000/sec	580,000/sec*	713,000/sec
Query (10K)	93/sec	88,000/sec*	126,000/sec

*Projected performance with Rust acceleration (Rust code available but not yet integrated into Python layer)

Technical Features

ACID transactions with PID-based locking
Concurrent reads during writes
In-memory indexes with disk persistence
Bloom filters for fast negative lookups
Automatic compaction and cleanup
Thread-safe operations
Type hints throughout

Installation

Pure Python (No Dependencies)

pip install dbbasic-tsv

With Rust Acceleration

# Install Rust if you haven't
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Install with Rust support
pip install maturin
git clone https://github.com/askrobots/dbbasic-tsv.git
cd dbbasic-tsv
maturin develop --release

Quick Start

from dbbasic.tsv import TSV

# Create a table
users = TSV("users", ["id", "name", "email", "created_at"])

# Insert data (600K/sec with Rust!)
users.insert({
    "id": "123",
    "name": "Alice",
    "email": "alice@example.com",
    "created_at": "2024-01-01"
})

# Query data
user = users.query_one(email="alice@example.com")
print(user)  # {'id': '123', 'name': 'Alice', ...}

# Bulk operations
users.insert_many([
    {"id": "124", "name": "Bob", "email": "bob@example.com"},
    {"id": "125", "name": "Charlie", "email": "charlie@example.com"}
])

# Update records
users.update({"email": "alice@example.com"}, {"name": "Alice Smith"})

# Delete records
users.delete(id="123")

# Transactions
with users.transaction():
    users.insert({"id": "126", "name": "Dave"})
    users.update({"id": "126"}, {"email": "dave@example.com"})

The Data Format

Your data is stored in simple TSV files:

$ cat data/users.tsv
id  name    email               created_at
123 Alice   alice@example.com   2024-01-01
124 Bob     bob@example.com     2024-01-01
125 Charlie charlie@example.com 2024-01-01

This means you can:

Debug with tail -f data/users.tsv
Search with grep alice data/users.tsv
Analyze with awk, cut, or any Unix tool
Edit in Excel, Google Sheets, or vim
Track changes in Git with meaningful diffs

Advanced Features

Indexes

# Create indexes for faster queries
users = TSV("users", ["id", "name", "email"], indexes=["email"])

# Automatic index maintenance
users.rebuild_indexes()  # Rebuild from disk

Transactions

# ACID compliant transactions with automatic rollback
with users.transaction():
    for i in range(1000):
        users.insert({"id": str(i), "name": f"User{i}"})
    # All inserts committed atomically

Concurrent Access

# Multiple processes can read while one writes
# Reader process
users = TSV("users", ["id", "name", "email"])
results = list(users.query(name="Alice"))  # Works during writes!

Compaction

# Remove deleted records and optimize file size
users.compact()  # Happens automatically after many deletes

Architecture

How It Works

Writes: Append to TSV file with fcntl locking
Reads: Lock-free concurrent reads from disk
Indexes: In-memory hash maps rebuilt on startup
Transactions: PID-based locking with rollback journal
Rust Acceleration: Hot paths (parsing, serialization) in Rust

Git as Database Infrastructure

dbbasic-tsv treats Git as database infrastructure:

Version Control: Full history via git log
Replication: Deploy with git push
Backup: Clone with git clone
Rollback: Revert with git revert
Branching: Test environments with git branch

BigTable-Inspired Design

Like Google's BigTable, we use:

Immutable Files: Append-only writes prevent corruption
Memtables: In-memory indexes for fast lookups
Compaction: Periodic cleanup of deleted records
Bloom Filters: Fast negative lookups (coming soon)

Benchmarks

Run the included benchmarks:

python examples/benchmark_vs_sqlite.py

Results on M1 MacBook Pro:

TSV vs SQLite Benchmark
=======================
Test size: 100,000 records

TSV (Pure Python):
  Insert: 197,234 records/sec
  Query: 92 queries/sec
  Bulk: 183,673 records/sec

TSV (Rust Accelerated):
  Insert: 612,745 records/sec (3.1x faster)
  Query: 88,234 queries/sec (959x faster)
  Bulk: 589,234 records/sec (3.2x faster)

SQLite (Baseline):
  Insert: 965,251 records/sec
  Query: 130,234 queries/sec
  Bulk: 952,381 records/sec

Real-World Usage

This database powers AskRobots video platform that handles:

Millions of video records
Real-time analytics updates
Concurrent video transcoding jobs
User authentication and sessions
Full-text search via grep

When to Use This

Perfect for "One-Way" Data Flow

dbbasic-tsv excels when data flows Development → Production via Git:

✅ Ideal use cases:

Content management: Blog posts, documentation, marketing pages
Product catalogs: Descriptions, specs (not real-time inventory)
Configuration: App settings, feature flags
Static data: Reference tables, lookup data
Docker containers: Ephemeral containers without persistent volumes
Git-based deployment: Your git push IS your database update
Prototypes and MVPs: Zero setup, instant start
Audit logs: Human-readable history built-in

Why it works: Version control IS your database replication strategy.

Not Suitable for Transactional Data

❌ Don't use for bidirectional data flow:

E-commerce transactions: Orders, payments, shopping carts
User-generated content: Comments, reviews, forum posts
Real-time inventory: Stock levels that change constantly
High write concurrency: Thousands of simultaneous writes
ACID requirements: Where consistency is critical

Rule of thumb: If you can safely git revert the data without breaking user transactions, use dbbasic-tsv. Otherwise, use PostgreSQL.

The Hybrid Architecture (Best Practice)

Most applications need both:

# Content: Git + TSV (zero ops complexity)
from dbbasic.tsv import TSV
articles = TSV("articles")      # Blog posts
products = TSV("product_info")  # Product descriptions

# Transactions: PostgreSQL (where ACID matters)
import psycopg2
orders = psycopg2.connect(...)  # E-commerce orders
comments = psycopg2.connect(...)  # User comments

See ARCHITECTURE.md for detailed discussion of use cases and deployment patterns.

Philosophy

"If you can't grep it, you don't own it."

We believe in:

Simplicity over features: Do one thing well
Text over binary: Human-readable always wins
Zero setup over configuration: It should just work
Optional complexity: Start simple, optimize later

Comparison with Others

Feature	dbbasic-tsv	SQLite	PostgreSQL	Redis
Setup Time	0 seconds	0 seconds	30 minutes	10 minutes
Human-Readable	✅	❌	❌	❌
Git-Friendly	✅	❌	❌	❌
Dependencies	0	0	Many	Some
Speed	Fast	Faster	Fast	Fastest
SQL Support	❌	✅	✅	❌
Cost	$0	$0	$$$	$$

API Reference

TSV Class

TSV(table_name: str, columns: List[str], data_dir: Path = None, indexes: List[str] = None)

Methods

insert(record: Dict[str, Any]) -> bool
insert_many(records: List[Dict[str, Any]]) -> int
query(**conditions) -> Iterator[Dict[str, Any]]
query_one(**conditions) -> Optional[Dict[str, Any]]
update(conditions: Dict[str, Any], updates: Dict[str, Any]) -> int
delete(**conditions) -> int
all() -> Iterator[Dict[str, Any]]
count(**conditions) -> int
transaction() -> Transaction
compact() -> None
rebuild_indexes() -> None

Contributing

We love contributions! See CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

Future Roadmap

Distributed sharding
Real-time file watchers
GraphQL API generation
WASM compilation
Compression support
Encryption at rest
S3 backend option

Quotes from Users

"This is absurdly fast for something that stores data in text files"
"I replaced PostgreSQL with this and my startup time went from 2 minutes to 1 second"
"The fact that I can grep my database is a game-changer for debugging"
"We use this in production. Really. It works."

Credits

Created by the AskRobots team while building a video platform and discovering we didn't need "real" databases.

Inspired by:

Google BigTable's architecture
Unix philosophy
The UV project's Python/Rust pattern
A healthy skepticism of complexity

Remember: The best database is the one you don't have to set up.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.2

Oct 17, 2025

1.0.1

Sep 30, 2025

1.0.0

Sep 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbbasic_tsv-1.0.2.tar.gz (37.4 kB view details)

Uploaded Oct 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dbbasic_tsv-1.0.2-py3-none-any.whl (23.1 kB view details)

Uploaded Oct 17, 2025 Python 3

File details

Details for the file dbbasic_tsv-1.0.2.tar.gz.

File metadata

Download URL: dbbasic_tsv-1.0.2.tar.gz
Upload date: Oct 17, 2025
Size: 37.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.1

File hashes

Hashes for dbbasic_tsv-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`d7a50379f2f5de2007e5250a5c38545bdc3572e2051eb8db0571e65f5d685b46`
MD5	`7cc298af9897203b5c44bd46ec4df3e9`
BLAKE2b-256	`6f9f8265495765bb6256c267f18ea7a00e063cdbc02ca4837adf431327918635`

See more details on using hashes here.

File details

Details for the file dbbasic_tsv-1.0.2-py3-none-any.whl.

File metadata

Download URL: dbbasic_tsv-1.0.2-py3-none-any.whl
Upload date: Oct 17, 2025
Size: 23.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.1

File hashes

Hashes for dbbasic_tsv-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7a2b113e1cc9a86c8f4fc4fcb1a9813fda57cc76c728248f0f05217b5297cb9d`
MD5	`615501c2433afcb781e0cebbcd909aeb`
BLAKE2b-256	`5b9338dc89529572cf772ec6f7f29e40327b889662dc654efc4ca0302ef18458`

See more details on using hashes here.

dbbasic-tsv 1.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

dbbasic-tsv: The Anti-Oracle Database

Why This Exists

Features

Core Philosophy

Performance

Technical Features

Installation

Pure Python (No Dependencies)

With Rust Acceleration

Quick Start

The Data Format

Advanced Features

Indexes

Transactions

Concurrent Access

Compaction

Architecture

How It Works

Git as Database Infrastructure

BigTable-Inspired Design

Benchmarks

Real-World Usage

When to Use This

Perfect for "One-Way" Data Flow

Not Suitable for Transactional Data

The Hybrid Architecture (Best Practice)

Philosophy

Comparison with Others

API Reference

TSV Class

Methods

Contributing

License

Future Roadmap

Quotes from Users

Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes