Filesystem-based structured storage for data with metadata - the middle ground between scattered files and databases

These details have not been verified by PyPI

Project links

Project description

📚 Shelfie

Simple filesystem-based structured storage for data with metadata

Shelfie helps you organize your data files in a structured, hierarchical way while automatically managing metadata. Think of it as a filing system that creates organized directories based on your data's characteristics and keeps track of important information about each dataset.

🎯 Why Shelfie?

Organized: Automatically creates directory structures based on your data's fields
Metadata-aware: Stores attributes alongside your data files
Flexible: Works with any data that can be saved as CSV, JSON, or pickle
Simple: Intuitive API for creating and reading structured datasets
Discoverable: Easy to browse and understand your data organization in the filesystem

Shelfie is meant to be an in between a full database and having to create a wrapper for a filesystem based storage for each project.

🏗️ How It Works

Conceptual Model: Database Relations → Directory Structure

Shelfie translates database-style relationships into filesystem organization:

Database Thinking          →    Filesystem Result
─────────────────────────────────────────────────────
Tables: [experiments]      →    Directory Level 1
Tables: [models]           →    Directory Level 2  
Tables: [dates]            →    Directory Level 3
Columns: epochs, lr        →    metadata.json
Data: results.csv          →    Attached files

Visual Concept

Root Directory
├── .shelfie.pkl                    # Shelf configuration
├── experiment_1/                   # Field 1 value
│   ├── random_forest/              # Field 2 value  
│   │   ├── 2025-06-12/             # Field 3 value (auto-generated date)
│   │   │   ├── metadata.json       # Stored attributes
│   │   │   ├── results.csv         # Your data files
│   │   │   └── model.pkl          # More data files
│   │   └── gradient_boost/
│   │       └── 2025-06-12/
│   │           ├── metadata.json
│   │           └── results.csv
│   └── neural_network/
│       └── 2025-06-12/
│           ├── metadata.json
│           └── predictions.csv
└── experiment_2/
    └── ...

The Pattern

Shelfie = Filesystem-Based Relational Design

Fields → Directory hierarchy (what you'd normalize into separate tables)
Attributes → Stored metadata (what you'd store as columns in those tables)
Data → Files attached to each record (the actual data your database would reference)
File Paths → Automatically tracked as filename_path__ in metadata

Traditional Database:

SELECT r.accuracy, e.name, m.type, r.epochs 
FROM results r
JOIN experiments e ON r.experiment_id = e.id  
JOIN models m ON r.model_id = m.id
WHERE e.date = '2025-06-12'

Shelfie Equivalent:

data = load_from_shelf("./experiments")
results_df = data['results']  # Already has experiment, model, date columns!
filtered = results_df[results_df['date'] == '2025-06-12']

🚀 Quick Start

Installation

pip install shelfie

Basic Example

import pandas as pd
from shelfie import Shelf, DateField

# Create a shelf for ML experiments
ml_shelf = Shelf(
    root="./experiments",
    fields=["experiment", "model", DateField("date")],  # Directory structure
    attributes=["epochs", "learning_rate"]              # Required metadata
)

# Create a new experiment record
experiment = ml_shelf.create(
    experiment="baseline",
    model="mlp", 
    epochs=100,
    learning_rate=0.001  # Typical learning rate for neural networks
)

# Attach your results
results_df = pd.DataFrame({
    "accuracy": [0.85, 0.87, 0.89], 
    "loss": [0.45, 0.32, 0.28],
    "epoch": [1, 2, 3]
})

experiment.attach(results_df, "results.csv")

This creates:

experiments/
└── baseline/
    └── mlp/
        └── 2025-06-12/
            ├── metadata.json  # {"epochs": 100, "learning_rate": 0.001, "results_path__": "/path/to/results.csv"}
            └── results.csv    # Your data

📖 Detailed Examples

1. ML Experiment Tracking

Think of this as three related database tables:

experiments table → project field
models table → model_type field
runs table → date field
Attributes: dataset, hyperparams, notes

from shelfie import Shelf, DateField, TimestampField
import pandas as pd

# Set up experiment tracking (defines your "table" relationships)
experiments = Shelf(
    root="./ml_experiments",
    fields=["project", "model_type", DateField("date")],  # Your table hierarchy
    attributes=["dataset", "hyperparams", "notes"]        # Your table columns
)

# Log different experiments
mlp_experiment = experiments.create(
    project="customer_churn",
    model_type="mlp",
    dataset="v2_cleaned",
    hyperparams={"hidden_layers": [128, 64, 32], "dropout": 0.3, "activation": "relu"},
    notes="Multi-layer perceptron with dropout regularization"
)

# Attach multiple files
mlp_experiment.attach(train_results, "training_metrics.csv")
mlp_experiment.attach(test_results, "test_results.csv")
mlp_experiment.attach(feature_importance, "feature_importance.csv")

# Try a different model
cnn_experiment = experiments.create(
    project="customer_churn",
    model_type="cnn",
    dataset="v2_cleaned", 
    hyperparams={"filters": [32, 64, 128], "kernel_size": 3, "learning_rate": 0.0001},
    notes="Convolutional neural network approach"
)

2. Sales Data by Region and Time

Database equivalent:

regions table → region field
time_periods table → year, quarter fields
Attributes: analyst, report_type, data_source

# Organize sales data by geography and time (multi-table relationship)
sales_shelf = Shelf(
    root="./sales_data",
    fields=["region", "year", "quarter"],                    # Geographic + temporal tables
    attributes=["analyst", "report_type", "data_source"]     # Report metadata columns
)

# Store Q1 data for North America
na_q1 = sales_shelf.create(
    region="north_america",
    year="2025", 
    quarter="Q1",
    analyst="john_doe",
    report_type="quarterly_summary",
    data_source="salesforce"
)

sales_data = pd.DataFrame({
    "product": ["A", "B", "C"],
    "revenue": [150000, 200000, 180000],
    "units_sold": [1500, 2000, 1800]
})

na_q1.attach(sales_data, "quarterly_sales.csv")

3. Survey Data Organization

Database tables: survey_types → demographics → timestamps

# Organize survey responses by type and demographics
surveys = Shelf(
    root="./survey_data",
    fields=["survey_type", "demographic", TimestampField("timestamp")],  # Survey taxonomy
    attributes=["sample_size", "methodology", "response_rate"]            # Survey metadata
)

# Store customer satisfaction survey
survey = surveys.create(
    survey_type="customer_satisfaction",
    demographic="millennials",
    sample_size=1000,
    methodology="online_panel", 
    response_rate=0.23
)

responses = pd.DataFrame({
    "question_id": [1, 2, 3, 4, 5],
    "avg_score": [4.2, 3.8, 4.1, 3.9, 4.0],
    "response_count": [920, 915, 898, 901, 911]
})

survey.attach(responses, "responses.csv")

📊 Reading Your Data Back

The Magic: Automatic JOIN Operations

Unlike databases where you need explicit JOINs, Shelfie automatically combines your "table" relationships:

from shelfie import load_from_shelf

# Load all data from experiments shelf
data = load_from_shelf("./ml_experiments")

# Returns a dictionary of DataFrames - like running multiple JOINed queries:
# {
#   'metadata': All experiment metadata with project+model+date info,
#   'training_metrics': Training data with experiment context automatically joined,
#   'test_results': Test data with experiment context automatically joined,
#   ...
# }

# Analyze all your experiments - no JOINs needed!
print(data['metadata'])  # Overview of all experiments
print(data['training_metrics'])  # All training metrics with full context

# Note: File paths are stored as filename_path__ columns (e.g., 'training_metrics_path__')

What you get automatically:

Denormalized DataFrames: Each CSV gets experiment+model+date columns added
Full Context: Every row knows its complete "relational" context
No JOIN complexity: Relationships are already materialized
Pandas-ready: Immediate analysis without SQL knowledge

Each DataFrame automatically includes:

Original data columns: Your actual data
Attribute columns: Metadata from your "table columns" (hyperparams, notes, etc.)
Field columns: Directory structure as relational context (project, model_type, date)
File path columns: References as filename_path__ columns

🛠️ Advanced Features

Custom Fields with Defaults

from shelfie import Field, DateField, TimestampField

# Field with a default value
shelf = Shelf(
    root="./data",
    fields=[
        "experiment",
        Field("environment", default="production"),  # Always "production" unless specified
        DateField("date"),                          # Auto-generates today's date
        TimestampField("timestamp")                 # Auto-generates current timestamp
    ],
    attributes=["version"]
)

# Only need to specify non-default fields
record = shelf.create(
    experiment="test_1",
    version="1.0"
)
# Creates: ./data/test_1/production/2025-06-12/2025-06-12_14-30-45/
# Metadata includes: version_path__ for any attached files

Multiple File Types

# Attach different file types
record.attach(results_df, "results.csv")           # CSV
record.attach(model_config, "config.json")         # JSON  
record.attach(trained_model, "model.pkl")          # Pickle
record.attach(report_text, "summary.txt")          # Text

Loading Existing Shelves

# Load a shelf that was created elsewhere
existing_shelf = Shelf.load_from_root("./experiments")

# Continue adding to it
new_experiment = existing_shelf.create(
    experiment="advanced",
    model="transformer",
    epochs=50,
    learning_rate=0.0001  # Lower learning rate for transformer models
)

🗂️ Directory Structure Examples

Before Shelfie

my_project/
├── experiment1_mlp_results.csv
├── experiment1_mlp_model.pkl  
├── experiment2_cnn_results.csv
├── experiment2_cnn_model.pkl
├── baseline_test_data.csv
├── advanced_test_data.csv
└── notes.txt  # Which file belongs to what?

After Shelfie

my_project/
├── baseline/
│   ├── mlp/
│   │   └── 2025-06-12/
│   │       ├── metadata.json      # {"epochs": 100, "lr": 0.001, "results_path__": "/path/results.csv"}
│   │       ├── results.csv
│   │       └── model.pkl
│   └── cnn/
│       └── 2025-06-12/
│           ├── metadata.json      # {"epochs": 200, "lr": 0.0001, "results_path__": "/path/results.csv"}
│           ├── results.csv
│           └── model.pkl
└── advanced/
    └── transformer/
        └── 2025-06-12/
            ├── metadata.json      # {"epochs": 50, "lr": 0.0001, "results_path__": "/path/results.csv"}
            ├── results.csv
            └── model.pkl

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License.

Happy organizing! 📚✨

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Jun 12, 2025

0.1.0

Jun 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shelfie-0.2.0.tar.gz (34.0 kB view details)

Uploaded Jun 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

shelfie-0.2.0-py3-none-any.whl (10.7 kB view details)

Uploaded Jun 12, 2025 Python 3

File details

Details for the file shelfie-0.2.0.tar.gz.

File metadata

Download URL: shelfie-0.2.0.tar.gz
Upload date: Jun 12, 2025
Size: 34.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.31

File hashes

Hashes for shelfie-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`f1be82c8b42db7c3de62f1445c2cd74df4d274964b44b29229b94d1213aca649`
MD5	`cb0aef6c2a1cf4175d2ca0a10f3ceae8`
BLAKE2b-256	`23032c62d647292ee4b01114a3a183d1b7ace226de28ffe75a3b639a82aa3ea7`

See more details on using hashes here.

File details

Details for the file shelfie-0.2.0-py3-none-any.whl.

File metadata

Download URL: shelfie-0.2.0-py3-none-any.whl
Upload date: Jun 12, 2025
Size: 10.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.31

File hashes

Hashes for shelfie-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8850e11aa0226358ee3e97b66694b8ba163f448e02bc4efce626c1638c1d248c`
MD5	`c2a7f178ca2dc506ee48690a3cf1eba0`
BLAKE2b-256	`373bba0b4fc326302ff76563311b7df21b24a9a5234a31c817fe809735586dda`

See more details on using hashes here.

shelfie 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

📚 Shelfie

🎯 Why Shelfie?

🏗️ How It Works

Conceptual Model: Database Relations → Directory Structure

Visual Concept

The Pattern

🚀 Quick Start

Installation

Basic Example

📖 Detailed Examples

1. ML Experiment Tracking

2. Sales Data by Region and Time

3. Survey Data Organization

📊 Reading Your Data Back

🛠️ Advanced Features

Custom Fields with Defaults

Multiple File Types

Loading Existing Shelves

🗂️ Directory Structure Examples

Before Shelfie

After Shelfie

🤝 Contributing

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes