Skip to main content

AI-Native Distributed Filesystem Architecture

Project description

Nexus: AI-Native Distributed Filesystem

Test Lint PyPI version License Python 3.11+

Version 0.1.0 | AI Agent Infrastructure Platform

Nexus is a complete AI agent infrastructure platform that combines distributed unified filesystem, self-evolving agent memory, intelligent document processing, and seamless deployment from local development to hosted production—all from a single codebase.

Features

Foundation

  • Distributed Unified Filesystem: Multi-backend abstraction (S3, GDrive, SharePoint, LocalFS)
  • Tiered Storage: Hot/Warm/Cold tiers with automatic lineage tracking
  • Content-Addressable Storage: 30-50% storage savings via deduplication
  • "Everything as a File" Paradigm: Configuration, memory, jobs, and commands as files

Agent Intelligence

  • Self-Evolving Memory: Agent memory with automatic consolidation
  • Memory Versioning: Track knowledge evolution over time
  • Multi-Agent Sharing: Shared memory spaces within tenants
  • Memory Analytics: Effectiveness tracking and insights
  • Prompt Version Control: Track prompt evolution with lineage
  • Training Data Management: Version-controlled datasets with deduplication
  • Prompt Optimization: Multi-candidate testing, execution traces, tradeoff analysis
  • Experiment Tracking: Organize optimization runs, per-example results, regression detection

Content Processing

  • Rich Format Parsing: Extensible parsers (PDF, Excel, CSV, JSON, images)
  • LLM KV Cache Management: 50-90% cost savings on AI queries
  • Semantic Chunking: Better search via intelligent document segmentation
  • MCP Integration: Native Model Context Protocol server
  • Document Type Detection: Automatic routing to appropriate parsers

Operations

  • Resumable Jobs: Checkpointing system survives restarts
  • OAuth Token Management: Auto-refreshing credentials
  • Backend Auto-Mount: Automatic recognition and mounting
  • Resource Management: CPU throttling and rate limiting
  • Work Queue Detection: SQL views for efficient task scheduling and dependency resolution

Deployment Modes

Nexus supports two deployment modes from a single codebase:

Mode Use Case Setup Time Scaling
Local Individual developers, CLI tools, prototyping 60 seconds Single machine (~10GB)
Hosted Teams and production (auto-scales) Sign up Automatic (GB to Petabytes)

Note: Hosted mode automatically scales infrastructure under the hood—you don't choose between "monolithic" or "distributed". Nexus handles that for you based on your usage.

Quick Start: Local Mode

import nexus

# Zero-deployment filesystem with AI features
# Config auto-discovered from nexus.yaml or environment
nx = nexus.connect()

async with nx:
    # Write and read files
    await nx.write("/workspace/data.txt", b"Hello World")
    content = await nx.read("/workspace/data.txt")

    # Semantic search across documents
    results = await nx.semantic_search(
        "/docs/**/*.pdf",
        query="authentication implementation"
    )

    # LLM-powered document reading with KV cache
    answer = await nx.llm_read(
        "/reports/q4.pdf",
        prompt="Summarize key findings",
        model="claude-sonnet-4"
    )

Config file (nexus.yaml):

mode: local
data_dir: ./nexus-data
cache_size_mb: 100
enable_vector_search: true

Quick Start: Hosted Mode

Coming Soon! Sign up for early access at nexus.ai

import nexus

# Connect to Nexus hosted instance
# Infrastructure scales automatically based on your usage
nx = nexus.connect(
    api_key="your-api-key",
    endpoint="https://api.nexus.ai"
)

async with nx:
    # Same API as local mode!
    await nx.write("/workspace/data.txt", b"Hello World")
    content = await nx.read("/workspace/data.txt")

For self-hosted deployments, see the S3-Compatible HTTP Server section below for deployment instructions.

Storage Backends

Nexus supports multiple storage backends through a unified API. All backends use Content-Addressable Storage (CAS) for automatic deduplication.

Local Backend (Default)

Store files on local filesystem:

import nexus

# Auto-detected from config or uses default
nx = nexus.connect()

# Or explicitly configure
nx = nexus.connect(config={
    "backend": "local",
    "data_dir": "./nexus-data"
})

Google Cloud Storage (GCS) Backend

Store files in Google Cloud Storage with local metadata:

import nexus

# Connect with GCS backend
nx = nexus.connect(config={
    "backend": "gcs",
    "gcs_bucket_name": "my-nexus-bucket",
    "gcs_project_id": "my-gcp-project",  # Optional
    "gcs_credentials_path": "/path/to/credentials.json",  # Optional
})

Authentication Methods:

  1. Service Account Key: Provide gcs_credentials_path
  2. Application Default Credentials (if not provided):
    • GOOGLE_APPLICATION_CREDENTIALS environment variable
    • gcloud auth application-default login credentials
    • GCE/Cloud Run service account (when running on GCP)

Using Config File (nexus.yaml):

backend: gcs
gcs_bucket_name: my-nexus-bucket
gcs_project_id: my-gcp-project  # Optional
# gcs_credentials_path: /path/to/credentials.json  # Optional

Using Environment Variables:

export NEXUS_BACKEND=gcs
export NEXUS_GCS_BUCKET_NAME=my-nexus-bucket
export NEXUS_GCS_PROJECT_ID=my-gcp-project  # Optional
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json  # Optional

CLI Usage with GCS:

# Write file to GCS
nexus write /workspace/data.txt "Hello GCS!" \
  --backend=gcs \
  --gcs-bucket=my-nexus-bucket

# Or use config file (simpler!)
nexus write /workspace/data.txt "Hello GCS!" --config=nexus.yaml

Advanced: Direct Backend API

For advanced use cases, instantiate backends directly:

from nexus import NexusFS, LocalBackend, GCSBackend

# Local backend
nx_local = NexusFS(
    backend=LocalBackend("/path/to/data"),
    db_path="./metadata.db"
)

# GCS backend
nx_gcs = NexusFS(
    backend=GCSBackend(
        bucket_name="my-bucket",
        project_id="my-project",
        credentials_path="/path/to/creds.json"
    ),
    db_path="./gcs-metadata.db"
)

# Same API for both!
nx_local.write("/file.txt", b"data")
nx_gcs.write("/file.txt", b"data")

Backend Comparison

Feature Local Backend GCS Backend
Content Storage Local filesystem Google Cloud Storage
Metadata Storage Local SQLite Local SQLite
Deduplication ✅ CAS (30-50% savings) ✅ CAS (30-50% savings)
Multi-machine Access ❌ Single machine ✅ Shared across machines
Durability Single disk 99.999999999% (11 nines)
Latency <1ms (local) 10-50ms (network)
Cost Free (local disk) GCS storage pricing
Use Case Development, single machine Teams, production, backup

Coming Soon

  • Amazon S3 Backend (v0.7.0)
  • Azure Blob Storage (v0.7.0)
  • Google Drive (v0.7.0)
  • SharePoint (v0.7.0)

Installation

Using pip (Recommended)

# Install from PyPI
pip install nexus-ai-fs

# Verify installation
nexus --version

From Source (Development)

# Clone the repository
git clone https://github.com/nexi-lab/nexus.git
cd nexus

# Install using uv (recommended for faster installs)
curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -e ".[dev]"

# Or using pip
pip install -e ".[dev]"

Development Setup

# Install development dependencies
uv pip install -e ".[dev,test]"

# Run tests
pytest

# Run type checking
mypy src/nexus

# Format code
ruff format .

# Lint
ruff check .

CLI Usage

Nexus provides a beautiful command-line interface for all file operations. After installation, the nexus command will be available.

Quick Start

# Initialize a new workspace
nexus init ./my-workspace

# Write a file
nexus write /workspace/hello.txt "Hello, Nexus!"

# Read a file
nexus cat /workspace/hello.txt

# List files
nexus ls /workspace
nexus ls /workspace --recursive
nexus ls /workspace --long  # Detailed view with metadata

Available Commands

File Operations

# Write content to a file
nexus write /path/to/file.txt "content"
echo "content" | nexus write /path/to/file.txt --input -

# Display file contents (with syntax highlighting)
nexus cat /workspace/code.py

# Copy files
nexus cp /source.txt /dest.txt

# Delete files
nexus rm /workspace/old-file.txt
nexus rm /workspace/old-file.txt --force  # Skip confirmation

# Show file information
nexus info /workspace/data.txt

Directory Operations

# Create directory
nexus mkdir /workspace/data
nexus mkdir /workspace/deep/nested/dir --parents

# Remove directory
nexus rmdir /workspace/data
nexus rmdir /workspace/data --recursive --force

File Discovery

# List files
nexus ls /workspace
nexus ls /workspace --recursive
nexus ls /workspace --long  # Show size, modified time, etag

# Find files by pattern (glob)
nexus glob "**/*.py"  # All Python files recursively
nexus glob "*.txt" --path /workspace  # Text files in workspace
nexus glob "test_*.py"  # Test files

# Search file contents (grep)
nexus grep "TODO"  # Find all TODO comments
nexus grep "def \w+" --file-pattern "**/*.py"  # Find function definitions
nexus grep "error" --ignore-case  # Case-insensitive search
nexus grep "TODO" --max-results 50  # Limit results

# Search modes (v0.2.0+)
nexus grep "revenue" --file-pattern "**/*.pdf"  # Auto mode: tries parsed first
nexus grep "revenue" --file-pattern "**/*.pdf" --search-mode=parsed  # Only parsed content
nexus grep "TODO" --search-mode=raw  # Only raw text (skip parsing)

# Result shows source type
# Match: TODO (parsed) ← from parsed PDF
# Match: TODO (raw) ← from source code

Work Queue Operations

# Query work items by status
nexus work ready --limit 10  # Get ready work items (high priority first)
nexus work pending  # Get pending work items
nexus work blocked  # Get blocked work items (with dependency info)
nexus work in-progress  # Get currently processing items

# View aggregate statistics
nexus work status  # Show counts for all work queues

# Output as JSON (for scripting)
nexus work ready --json
nexus work status --json

Note: Work items are files with special metadata (status, priority, depends_on, worker_id). See docs/SQL_VIEWS_FOR_WORK_DETECTION.md for details on setting up work queues.

Examples

Initialize and populate a workspace:

# Create workspace
nexus init ./my-project

# Create structure
nexus mkdir /workspace/src --data-dir ./my-project/nexus-data
nexus mkdir /workspace/tests --data-dir ./my-project/nexus-data

# Add files
echo "print('Hello World')" | nexus write /workspace/src/main.py --input - \
  --data-dir ./my-project/nexus-data

# List everything
nexus ls / --recursive --long --data-dir ./my-project/nexus-data

Find and analyze code:

# Find all Python files
nexus glob "**/*.py"

# Search for TODO comments
nexus grep "TODO|FIXME" --file-pattern "**/*.py"

# Find all test files
nexus glob "**/test_*.py"

# Search for function definitions
nexus grep "^def \w+\(" --file-pattern "**/*.py"

Work with data:

# Write JSON data
echo '{"name": "test", "value": 42}' | nexus write /data/config.json --input -

# Display with syntax highlighting
nexus cat /data/config.json

# Get file information
nexus info /data/config.json

Global Options

All commands support these global options:

# Use custom config file
nexus ls /workspace --config /path/to/config.yaml

# Override data directory
nexus ls /workspace --data-dir /path/to/nexus-data

# Combine both (config takes precedence)
nexus ls /workspace --config ./my-config.yaml --data-dir ./data

Help

Get help for any command:

nexus --help  # Show all commands
nexus ls --help  # Show help for ls command
nexus grep --help  # Show help for grep command

Remote Nexus Server

Nexus includes a JSON-RPC server that exposes the full NexusFileSystem interface over HTTP, enabling remote filesystem access and FUSE mounts to remote servers.

Quick Start

# Start the server (optional API key authentication)
nexus serve --host 0.0.0.0 --port 8080 --api-key mysecret

# Use remote filesystem from Python
from nexus import RemoteNexusFS

nx = RemoteNexusFS(
    server_url="http://localhost:8080",
    api_key="mysecret"  # Optional
)

# Same API as local NexusFS!
nx.write("/workspace/hello.txt", b"Hello Remote!")
content = nx.read("/workspace/hello.txt")
files = nx.list("/workspace", recursive=True)

Features

  • Full NFS Interface: All filesystem operations exposed over RPC (read, write, list, glob, grep, mkdir, etc.)
  • JSON-RPC 2.0 Protocol: Standard RPC protocol with proper error handling
  • API Key Authentication: Optional Bearer token authentication for security
  • Backend Agnostic: Works with local and GCS backends
  • FUSE Compatible: Mount remote Nexus servers as local filesystems

Remote Client Usage

from nexus import RemoteNexusFS

# Connect to remote server
nx = RemoteNexusFS(
    server_url="http://your-server:8080",
    api_key="your-api-key"  # Optional
)

# All standard operations work
nx.write("/workspace/data.txt", b"content")
content = nx.read("/workspace/data.txt")
files = nx.list("/workspace", recursive=True)
results = nx.glob("**/*.py")
matches = nx.grep("TODO", file_pattern="*.py")

Server Options

# Start with custom host/port
nexus serve --host 0.0.0.0 --port 8080

# Start with API key authentication
nexus serve --api-key mysecret

# Start with GCS backend
nexus serve --backend=gcs --gcs-bucket=my-bucket --api-key mysecret

# Custom data directory
nexus serve --data-dir /path/to/data

Deploying Nexus Server

For production use, deploy Nexus to a VM with persistent storage for metadata:

Example Docker Deployment:

# Build Docker image
cd /path/to/nexus
docker build -t nexus-server:latest .

# Run server with GCS backend
docker run -d \
  --name nexus-server \
  --restart unless-stopped \
  -p 8080:8080 \
  -v /var/lib/nexus:/app/data \
  -e NEXUS_API_KEY="your-api-key" \
  -e NEXUS_BACKEND=gcs \
  -e NEXUS_GCS_BUCKET="your-bucket-name" \
  -e NEXUS_GCS_PROJECT="YOUR-PROJECT-ID" \
  -e NEXUS_DB_PATH="/app/data/nexus-metadata.db" \
  nexus-server:latest

Deployment Features:

  • Persistent Metadata: SQLite database stored on VM disk at /var/lib/nexus/
  • Content Storage: All file content stored in configured backend (GCS, local, etc.)
  • Content Deduplication: CAS-based storage with 30-50% savings
  • Full NFS API: All operations available remotely

FUSE Mount: Use Standard Unix Tools (v0.2.0)

Mount Nexus to a local path and use any standard Unix tool seamlessly - ls, cat, grep, vim, and more!

Installation

First, install FUSE support:

# Install Nexus with FUSE support
pip install nexus-ai-fs[fuse]

# Platform-specific FUSE library:
# macOS: Install macFUSE from https://osxfuse.github.io/
# Linux: sudo apt-get install fuse3  # or equivalent for your distro

Quick Start

# Mount Nexus to local path (smart mode by default)
nexus mount /mnt/nexus

# Now use ANY standard Unix tools!
ls -la /mnt/nexus/workspace/
cat /mnt/nexus/workspace/notes.txt
grep -r "TODO" /mnt/nexus/workspace/
find /mnt/nexus -name "*.py"
vim /mnt/nexus/workspace/code.py
git clone /some/repo /mnt/nexus/repos/myproject

# Unmount when done
nexus unmount /mnt/nexus

Quick Start Examples

Example 1: Default (Explicit Views) - Best for Mixed Workflows

# Mount normally
nexus mount /mnt/nexus

# Binary tools work directly
evince /mnt/nexus/docs/report.pdf     # PDF viewer works ✓

# Add .txt for text operations
cat /mnt/nexus/docs/report.pdf.txt    # Read as text
grep "results" /mnt/nexus/docs/*.pdf.txt

# Virtual views auto-generated
ls /mnt/nexus/docs/
# → report.pdf
# → report.pdf.txt  (virtual)
# → report.pdf.md   (virtual)

Example 2: Auto-Parse - Best for Search-Heavy Workflows

# Mount with auto-parse
nexus mount /mnt/nexus --auto-parse

# grep works directly on PDFs!
grep "results" /mnt/nexus/docs/*.pdf      # No .txt needed! ✓
cat /mnt/nexus/docs/report.pdf            # Returns text ✓

# Search across everything
grep -r "TODO" /mnt/nexus/workspace/      # Searches PDFs, Excel, etc.

# Binary via .raw/ when needed
evince /mnt/nexus/.raw/docs/report.pdf   # For PDF viewer

Example 3: Real-World Script

#!/bin/bash
# Find all PDFs mentioning "invoice"

nexus mount /mnt/nexus --auto-parse --daemon

# Now grep works on PDFs!
grep -l "invoice" /mnt/nexus/documents/*.pdf

# Process results
for pdf in $(grep -l "invoice" /mnt/nexus/documents/*.pdf); do
    echo "Found in: $pdf"
    grep -n "invoice" "$pdf" | head -5
done

nexus unmount /mnt/nexus

File Access: Two Modes

Nexus supports two ways to access files - choose what fits your workflow:

1. Explicit Views (Default) - Best for Compatibility

Binary files return binary, use .txt/.md suffixes for parsed content:

nexus mount /mnt/nexus

# Binary files work with native tools
evince /mnt/nexus/docs/report.pdf      # PDF viewer gets binary ✓
libreoffice /mnt/nexus/data/sheet.xlsx # Excel app gets binary ✓

# Add .txt to search/read as text
cat /mnt/nexus/docs/report.pdf.txt     # Returns parsed text
grep "pattern" /mnt/nexus/docs/*.pdf.txt

# Virtual views appear automatically
ls /mnt/nexus/docs/
# → report.pdf
# → report.pdf.txt  (virtual view)
# → report.pdf.md   (virtual view)

When to use: You want both binary tools AND text search to work

2. Auto-Parse Mode - Best for Search/Grep

Binary files return parsed text directly, use .raw/ for binary:

nexus mount /mnt/nexus --auto-parse

# Binary files return text directly - perfect for grep!
cat /mnt/nexus/docs/report.pdf         # Returns parsed text ✓
grep "pattern" /mnt/nexus/docs/*.pdf   # Works directly! ✓
less /mnt/nexus/docs/report.pdf        # Page through text ✓

# Access binary via .raw/ when needed
evince /mnt/nexus/.raw/docs/report.pdf # PDF viewer gets binary

# No .txt/.md suffixes - files return text by default
ls /mnt/nexus/docs/
# → report.pdf  (returns text when read)

When to use: Text search is your primary use case, binary tools are secondary

Mount Modes (Content Parsing)

Control what gets parsed:

# Smart mode (default) - Auto-detect file types
nexus mount /mnt/nexus --mode=smart
# ✅ PDFs, Excel, Word → parsed
# ✅ .py, .txt, .md → pass-through
# ✅ Best for mixed content

# Text mode - Parse everything aggressively
nexus mount /mnt/nexus --mode=text
# ✅ All files parsed to text
# ⚠️  Slower (always parses)

# Binary mode - No parsing at all
nexus mount /mnt/nexus --mode=binary
# ✅ All files return binary
# ❌ grep won't work on PDFs

Comparison Table

Feature Explicit Views (default) Auto-Parse Mode (--auto-parse)
PDF viewers work evince file.pdf ⚠️ evince .raw/file.pdf
grep on PDFs ⚠️ grep *.pdf.txt grep *.pdf
Excel apps work libreoffice file.xlsx ⚠️ libreoffice .raw/file.xlsx
Best for Binary tools + search Text search primary use case
Virtual views .txt, .md suffixes No suffixes needed
Binary access Direct (file.pdf) Via .raw/ directory

Background (Daemon) Mode

Run the mount in the background so you can close your terminal:

# Mount in background
nexus mount /mnt/nexus --daemon

# Do your work...
ls /mnt/nexus
cat /mnt/nexus/workspace/file.txt

# Later, unmount when done
nexus unmount /mnt/nexus

Performance & Caching (v0.2.0)

FUSE mounts include automatic caching for improved performance. Caching is enabled by default with sensible defaults - no configuration needed for most users.

Default Performance:

  • ✅ Attribute caching (1024 entries, 60s TTL) - Makes ls and stat operations faster
  • ✅ Content caching (100 files) - Speeds up repeated file reads
  • ✅ Parsed content caching (50 files) - Accelerates PDF/Excel text extraction
  • ✅ Automatic cache invalidation on writes/deletes - Always consistent

Advanced: Custom Cache Configuration

For power users with specific performance requirements:

from nexus import connect
from nexus.fuse import mount_nexus

nx = connect(config={"data_dir": "./nexus-data"})

# Custom cache configuration
cache_config = {
    "attr_cache_size": 2048,      # Double the attribute cache (default: 1024)
    "attr_cache_ttl": 120,         # Cache attributes for 2 minutes (default: 60s)
    "content_cache_size": 200,     # Cache 200 files (default: 100)
    "parsed_cache_size": 100,      # Cache 100 parsed files (default: 50)
    "enable_metrics": True         # Track cache hit/miss rates (default: False)
}

fuse = mount_nexus(
    nx,
    "/mnt/nexus",
    mode="smart",
    cache_config=cache_config,
    foreground=False
)

# View cache performance (if metrics enabled)
# Note: Access via fuse.fuse.operations.cache

Cache Configuration Options:

Option Default Description
attr_cache_size 1024 Max number of cached file attribute entries
attr_cache_ttl 60 Time-to-live for attributes in seconds
content_cache_size 100 Max number of cached file contents
parsed_cache_size 50 Max number of cached parsed contents (PDFs, etc.)
enable_metrics False Enable cache hit/miss tracking

When to Tune Cache Settings:

  • Large directory listings: Increase attr_cache_size to 2048+ and attr_cache_ttl to 120+
  • Many small files: Increase content_cache_size to 500+
  • Heavy PDF/Excel use: Increase parsed_cache_size to 200+
  • Performance analysis: Enable enable_metrics to measure cache effectiveness
  • Memory-constrained: Decrease all cache sizes (e.g., 512 / 50 / 25)

Notes:

  • Caches are thread-safe - safe for concurrent access
  • Caches are automatically invalidated on file writes, deletes, and renames
  • Default settings work well for most use cases - tune only if needed

rclone-style CLI Commands (v0.2.0)

Nexus provides efficient file operations inspired by rclone, with automatic deduplication and progress tracking:

Sync Command

One-way synchronization with hash-based change detection:

# Sync local directory to Nexus (only copies changed files)
nexus sync ./local/dataset/ /workspace/training/

# Preview changes before syncing (dry-run)
nexus sync ./data/ /workspace/backup/ --dry-run

# Mirror sync - delete extra files in destination
nexus sync /workspace/source/ /workspace/dest/ --delete

# Disable hash comparison (force copy all files)
nexus sync ./data/ /workspace/ --no-checksum

Copy Command

Smart copy with automatic deduplication:

# Copy directory recursively (skips identical files)
nexus copy ./local/data/ /workspace/project/ --recursive

# Copy within Nexus (leverages CAS deduplication)
nexus copy /workspace/source/ /workspace/dest/ --recursive

# Copy Nexus to local
nexus copy /workspace/data/ ./backup/ --recursive

# Copy single file
nexus copy /workspace/file.txt /workspace/copy.txt

# Disable checksum verification
nexus copy ./data/ /workspace/ --recursive --no-checksum

Move Command

Efficient file/directory moves with confirmation prompts:

# Move file (rename if possible, copy+delete otherwise)
nexus move /workspace/old.txt /workspace/new.txt

# Move directory without confirmation
nexus move /workspace/old_dir/ /archives/2024/ --force

Tree Command

Visualize directory structure as ASCII tree:

# Show full directory tree
nexus tree /workspace/

# Limit depth to 2 levels
nexus tree /workspace/ -L 2

# Show file sizes
nexus tree /workspace/ --show-size

Size Command

Calculate directory sizes with human-readable output:

# Calculate total size
nexus size /workspace/project/

# Human-readable output (KB, MB, GB)
nexus size /workspace/ --human

# Show top 10 largest files
nexus size /workspace/ --human --details

Features:

  • Hash-based deduplication - Only copies changed files
  • Progress bars - Visual feedback for long operations
  • Dry-run mode - Preview changes before execution
  • Cross-platform paths - Works with local filesystem and Nexus paths
  • Automatic deduplication - Leverages Content-Addressable Storage (CAS)

Performance Comparison

Method Speed Content-Aware Use Case
grep -r /mnt/nexus/ Medium ✅ Yes (via mount) Interactive use
nexus grep "pattern" Fast (DB-backed) ✅ Yes Large-scale search
Standard tools Familiar ✅ Yes (via mount) Day-to-day work

Use Cases

Interactive Development:

# Mount for interactive work
nexus mount /mnt/nexus
vim /mnt/nexus/workspace/code.py
git clone /mnt/nexus/repos/myproject

Bulk Operations:

# Use rclone-style commands for efficiency
nexus sync /local/dataset/ /workspace/training-data/
nexus tree /workspace/ > structure.txt

Automated Workflows:

# Standard Unix tools in scripts
find /mnt/nexus -name "*.pdf" -exec grep -l "invoice" {} \;
rsync -av /mnt/nexus/workspace/ /backup/

Architecture

Agent Workspace Structure

Every agent gets a structured workspace at /workspace/{tenant}/{agent}/:

/workspace/acme-corp/research-agent/
├── .nexus/                          # Nexus metadata (Git-trackable)
│   ├── agent.yaml                   # Agent configuration
│   ├── commands/                    # Custom commands (markdown files)
│   │   ├── analyze-codebase.md
│   │   └── summarize-docs.md
│   ├── jobs/                        # Background job definitions
│   │   └── daily-summary.yaml
│   ├── memory/                      # File-based memory
│   │   ├── project-knowledge.md
│   │   └── recent-tasks.jsonl
│   └── secrets.encrypted            # KMS-encrypted credentials
├── data/                            # Agent's working data
│   ├── inputs/
│   └── outputs/
└── INSTRUCTIONS.md                  # Agent instructions (auto-loaded)

Path Namespace

/
├── workspace/        # Agent scratch space (hot tier, ephemeral)
├── shared/           # Shared tenant data (warm tier, persistent)
├── external/         # Pass-through backends (no content storage)
├── system/           # System metadata (admin-only)
└── archives/         # Cold storage (read-only)

Core Components

File System Operations

import nexus

# Works in both local and hosted modes
# Mode determined by config file or environment
nx = nexus.connect()

async with nx:
    # Basic operations
    await nx.write("/workspace/data.txt", b"content")
    content = await nx.read("/workspace/data.txt")
    await nx.delete("/workspace/data.txt")

    # Batch operations
    files = await nx.list("/workspace/", recursive=True)
    results = await nx.copy_batch(sources, destinations)

    # File discovery
    python_files = await nx.glob("**/*.py")
    todos = await nx.grep(r"TODO:|FIXME:", file_pattern="*.py")

Semantic Search

# Search across documents with vector embeddings
async with nexus.connect() as nx:
    results = await nx.semantic_search(
        path="/docs/",
        query="How does authentication work?",
        limit=10,
        filters={"file_type": "markdown"}
    )

    for result in results:
        print(f"{result.path}:{result.line} - {result.text}")

LLM-Powered Reading

# Read documents with AI, with automatic KV cache
async with nexus.connect() as nx:
    answer = await nx.llm_read(
        path="/reports/q4-2024.pdf",
        prompt="What were the top 3 challenges?",
        model="claude-sonnet-4",
        max_tokens=1000
    )

Agent Memory

# Store and retrieve agent memories
async with nexus.connect() as nx:
    await nx.store_memory(
        content="User prefers TypeScript over JavaScript",
        memory_type="preference",
        tags=["coding", "languages"]
    )

    memories = await nx.search_memories(
        query="programming language preferences",
        limit=5
    )

Prompt Optimization (Coming in v0.9.5)

# Track multiple prompt candidates during optimization
async with nexus.connect() as nx:
    # Start optimization run
    run_id = await nx.start_optimization_run(
        module_name="SearchModule",
        objectives=["accuracy", "latency", "cost"]
    )

    # Store prompt candidates with detailed traces
    for candidate in prompt_variants:
        version_id = await nx.store_prompt_version(
            module_name="SearchModule",
            prompt_template=candidate.template,
            metrics={"accuracy": 0.85, "latency_ms": 450},
            run_id=run_id
        )

        # Store execution traces for debugging
        await nx.store_execution_trace(
            prompt_version_id=version_id,
            inputs=test_inputs,
            outputs=predictions,
            intermediate_steps=reasoning_chain
        )

    # Analyze tradeoffs across candidates
    analysis = await nx.analyze_prompt_tradeoffs(
        run_id=run_id,
        objectives=["accuracy", "latency_ms", "cost_per_query"]
    )

    # Get per-example results to find failure patterns
    failures = await nx.get_failing_examples(
        prompt_version_id=version_id,
        limit=20
    )

Custom Commands

Create /workspace/{tenant}/{agent}/.nexus/commands/semantic-search.md:

---
name: semantic-search
description: Search codebase semantically
allowed-tools: [semantic_read, glob, grep]
required-scopes: [read]
model: sonnet
---

## Your task

Given query: {{query}}

1. Use `glob` to find relevant files by pattern
2. Use `semantic_read` to extract relevant sections
3. Summarize findings with file:line citations

Execute via API:

async with nexus.connect() as nx:
    result = await nx.execute_command(
        "semantic-search",
        context={"query": "authentication implementation"}
    )

Technology Stack

Core

  • Language: Python 3.11+
  • API Framework: FastAPI
  • Database: PostgreSQL (prod) / SQLite (dev)
  • Cache: Redis (prod) / In-memory (dev)
  • Vector DB: Qdrant
  • Object Storage: S3-compatible, GCS, Azure Blob

AI/ML

  • LLM Providers: Anthropic Claude, OpenAI, Google Gemini
  • Embeddings: text-embedding-3-large, voyage-ai
  • Parsing: PyPDF2, pandas, openpyxl, Pillow

Infrastructure

  • Orchestration: Kubernetes (distributed mode)
  • Monitoring: Prometheus + Grafana
  • Logging: Structlog + Loki
  • Admin UI: Simple HTML/JS (jobs, memories, files, operations)

Performance Targets

Metric Target Impact
Write Throughput 500-1000 MB/s 10-50× vs direct backend
Read Latency <10ms 10-50× vs remote storage
Memory Search <100ms Vector search across memories
Storage Savings 30-50% CAS deduplication
Job Resumability 100% Survives all restarts
LLM Cache Hit Rate 50-90% Major cost savings
Prompt Versioning Full lineage Track optimization history
Training Data Dedup 30-50% CAS-based deduplication
Prompt Optimization Multi-candidate Test multiple strategies in parallel
Trace Storage Full execution logs Debug failures, analyze patterns

Configuration

Local Mode

import nexus

# Config via Python (useful for programmatic configuration)
nx = nexus.connect(config={
    "mode": "local",
    "data_dir": "./nexus-data",
    "cache_size_mb": 100,
    "enable_vector_search": True
})

# Or let it auto-discover from nexus.yaml
nx = nexus.connect()

Self-Hosted Deployment

For organizations that want to run their own Nexus instance, create config.yaml:

mode: server  # local or server

database:
  url: postgresql://user:pass@localhost/nexus
  # or for SQLite: sqlite:///./nexus.db

cache:
  type: redis  # memory, redis
  url: redis://localhost:6379

vector_db:
  type: qdrant
  url: http://localhost:6333

backends:
  - type: s3
    bucket: my-company-files
    region: us-east-1

  - type: gdrive
    credentials_path: ./gdrive-creds.json

auth:
  jwt_secret: your-secret-key
  token_expiry_hours: 24

rate_limits:
  default: "100/minute"
  semantic_search: "10/minute"
  llm_read: "50/hour"

Run server:

nexus server --config config.yaml

Security

Multi-Layer Security Model

  1. API Key Authentication: Tenant and agent identification
  2. Row-Level Security (RLS): Database-level tenant isolation
  3. Type-Level Validation: Fail-fast validation before database operations
  4. UNIX-Style Permissions: Owner, group, and mode bits (coming in v0.2.0)
  5. ACL Permissions: Fine-grained access control lists (coming in v0.2.0)

Type-Level Validation (NEW in v0.1.0)

All domain types have validation methods that are called automatically before database operations. This provides:

  • Fail Fast: Catch invalid data before expensive database operations
  • Clear Error Messages: Actionable feedback for developers and API consumers
  • Data Integrity: Prevent invalid data from entering the database
  • Consistent Validation: Same rules across all code paths
from nexus.core.metadata import FileMetadata
from nexus.core.exceptions import ValidationError

# Validation happens automatically on put()
try:
    metadata = FileMetadata(
        path="/data/file.txt",  # Must start with /
        backend_name="local",
        physical_path="/storage/file.txt",
        size=1024,  # Must be >= 0
    )
    store.put(metadata)  # Validates before DB operation
except ValidationError as e:
    print(f"Validation failed: {e}")
    # Example: "size cannot be negative, got -1"

Validation Rules:

  • Paths must start with / and not contain null bytes
  • File sizes and ref counts must be non-negative
  • Required fields (path, backend_name, physical_path, etc.) must not be empty
  • Content hashes must be valid 64-character SHA-256 hex strings
  • Metadata keys must be ≤ 255 characters

Example: Multi-Tenancy Isolation

-- RLS automatically filters queries by tenant
SET LOCAL app.current_tenant_id = '<tenant_uuid>';

-- All queries auto-filtered, even with bugs
SELECT * FROM file_paths WHERE path = '/data';
-- Returns only rows for current tenant

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=nexus --cov-report=html

# Run specific test file
pytest tests/test_filesystem.py

# Run integration tests
pytest tests/integration/ -v

# Run performance tests
pytest tests/performance/ --benchmark-only

Documentation

Contributing

We welcome contributions! Please see CONTRIBUTING.md for details.

# Fork the repo and clone
git clone https://github.com/yourusername/nexus.git
cd nexus

# Create a feature branch
git checkout -b feature/your-feature

# Make changes and test
uv pip install -e ".[dev,test]"
pytest

# Format and lint
ruff format .
ruff check .

# Commit and push
git commit -am "Add your feature"
git push origin feature/your-feature

License

Apache 2.0 License - see LICENSE for details.

Roadmap

v0.1.0 - Local Mode Foundation (Current)

  • Core embedded filesystem (read/write/delete)
  • SQLite metadata store
  • Local filesystem backend
  • Basic file operations (list, glob, grep)
  • Virtual path routing
  • Directory operations (mkdir, rmdir, is_directory)
  • Basic CLI interface with Click and Rich
  • Metadata export/import (JSONL format)
  • SQL views for ready work detection
  • In-memory caching
  • Batch operations (avoid N+1 queries)
  • Type-level validation

v0.2.0 - FUSE Mount & Content-Aware Operations (Current)

  • FUSE filesystem mount - Mount Nexus to local path (e.g., /mnt/nexus)
  • Smart read mode - Return parsed text for binary files (PDFs, Excel, etc.)
  • Virtual file views - Auto-generate .txt and .md views for binary files
  • Content parser framework - Extensible parser system for document types (MarkItDown)
  • PDF parser - Extract text and markdown from PDFs
  • Excel/CSV parser - Parse spreadsheets to structured data
  • Content-aware file access - Access parsed content via virtual views
  • Document type detection - Auto-detect MIME types and route to parsers
  • Mount CLI commands - nexus mount, nexus unmount
  • Mount modes - Binary, text, and smart modes
  • .raw directory - Access original binary files
  • Background daemon mode - Run mount in background with --daemon
  • All FUSE operations - read, write, create, delete, mkdir, rmdir, rename, truncate
  • Unit tests - Comprehensive test coverage for FUSE operations
  • rclone-style CLI commands - sync, copy, move, tree, size with progress bars
  • Background parsing - Async content parsing on write
  • FUSE performance optimizations - Caching (TTL/LRU), cache invalidation, metrics
  • Image OCR parser - Extract text from images (PNG, JPEG)

v0.3.0 - File Permissions & Skills System

  • UNIX-style file permissions (owner, group, mode)
  • Permission operations (chmod, chown, chgrp)
  • Default permission policies per namespace
  • Permission inheritance for new files
  • Permission checking in all file operations
  • ACL (Access Control List) support
  • ReBAC (Relationship-Based Access Control) - Zanzibar-style authorization
  • Relationship types - member-of, owner-of, parent-of, shared-with
  • Permission inheritance via relationships - Team ownership, group membership
  • Relationship graph queries - Transitive closure, path existence checks
  • Namespaced tuples - (subject, relation, object) authorization model
  • Check API - Fast permission checks with caching
  • Expand API - Discover all subjects with specific permissions
  • Relationship management - Create, delete, query relationships
  • Permission migration for existing files
  • Comprehensive permission tests
  • Skills System integration - Anthropic-compatible SKILL.md format
  • Skill discovery & loading - Progressive disclosure, lazy loading
  • Agent-specific skills - /workspace/{tenant}/{agent}/.nexus/skills/
  • Tenant-wide skill library - /shared/{tenant}/skills/
  • System skills - /system/skills/ (Anthropic official)
  • Skill templates - Pre-built templates for common patterns
  • Skill versioning - CAS-backed version control
  • Skill composition - Automatic dependency resolution
  • Skill analytics - Usage tracking, success rates
  • Skill marketplace - Org-wide skill catalog
  • Skill CLI commands - create, fork, publish, search
  • Semantic skill search - Find skills by description

v0.4.0 - AI Integration

  • LLM provider abstraction
  • Anthropic Claude integration
  • OpenAI integration
  • Basic KV cache for prompts
  • Semantic search (vector embeddings)
  • LLM-powered document reading

v0.5.0 - Agent Workspaces

  • Agent workspace structure
  • File-based configuration (.nexus/)
  • Custom command system (markdown)
  • Basic agent memory storage
  • Memory consolidation
  • Memory reflection phase (ACE-inspired: extract insights from execution trajectories)
  • Strategy/playbook organization (ACE-inspired: organize memories as reusable strategies)

v0.6.0 - Server Mode (Self-Hosted & Managed)

  • FastAPI REST API
  • API key authentication
  • Multi-tenancy support
  • PostgreSQL support
  • Redis caching
  • Docker deployment
  • Batch/transaction APIs (atomic multi-operation updates)
  • Optimistic locking for concurrent writes
  • Auto-scaling configuration (for hosted deployments)

v0.7.0 - Extended Features & Event System

  • S3 backend support
  • Google Drive backend
  • Job system with checkpointing
  • OAuth token management
  • MCP server implementation
  • Webhook/event system (file changes, memory updates, job events)
  • Watch API for real-time updates (streaming changes to clients)
  • Server-Sent Events (SSE) support for live monitoring
  • Simple admin UI (jobs, memories, files, operation logs)
  • Operation logs table (track storage operations for debugging)

v0.8.0 - Advanced AI Features & Rich Query

  • Advanced KV cache with context tracking
  • Memory versioning and lineage
  • Multi-agent memory sharing
  • Enhanced semantic search
  • Importance-based memory preservation (ACE-inspired: prevent brevity bias in consolidation)
  • Context-aware memory retrieval (include execution context in search)
  • Automated strategy extraction (LLM-powered extraction from successful trajectories)
  • Rich memory query language (filter by metadata, importance, task type, date ranges, etc.)
  • Memory query builder API (fluent interface for complex queries)
  • Combined vector + metadata search (hybrid search)

v0.9.0 - Production Readiness

  • Monitoring and observability
  • Performance optimization
  • Comprehensive testing
  • Security hardening
  • Documentation completion
  • Optional OpenTelemetry export (for framework integration)

v0.9.5 - Prompt Engineering & Optimization

  • Prompt version control with lineage tracking
  • Training dataset storage with CAS deduplication
  • Evaluation metrics time series (performance tracking)
  • Frozen inference snapshots (immutable program state)
  • Experiment tracking export (MLflow, W&B integration)
  • Prompt diff viewer (compare versions)
  • Regression detection alerts (performance drops)
  • Multi-candidate pool management (concurrent prompt testing)
  • Execution trace storage (detailed run logs for debugging)
  • Per-example evaluation results (granular performance tracking)
  • Optimization run grouping (experiment management)
  • Multi-objective tradeoff analysis (accuracy vs latency vs cost)

v0.10.0 - Production Infrastructure & Auto-Scaling

  • Automatic infrastructure scaling
  • Redis distributed locks (for large deployments)
  • PostgreSQL replication (for high availability)
  • Kubernetes deployment templates
  • Multi-region load balancing
  • Automatic migration from single-node to distributed

v1.0.0 - Production Release

  • Complete feature set
  • Production-tested
  • Comprehensive documentation
  • Migration tools
  • Enterprise support

Support


Built with ❤️ by the Nexus team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nexus_ai_fs-0.2.2.tar.gz (508.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nexus_ai_fs-0.2.2-py3-none-any.whl (118.7 kB view details)

Uploaded Python 3

File details

Details for the file nexus_ai_fs-0.2.2.tar.gz.

File metadata

  • Download URL: nexus_ai_fs-0.2.2.tar.gz
  • Upload date:
  • Size: 508.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nexus_ai_fs-0.2.2.tar.gz
Algorithm Hash digest
SHA256 6473d67463bbe527264a086c6f68dbdcd128f885170ccaf5bd66a08261042a1e
MD5 c5c6d5aae838d2e1e1d82ad019123402
BLAKE2b-256 3edb98d34fb2d7222baeab873eda9a0075af0e5332563d2eca6c63ed0087fef3

See more details on using hashes here.

File details

Details for the file nexus_ai_fs-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: nexus_ai_fs-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 118.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nexus_ai_fs-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 36bd3bdeb09b44a99185a83adacef8c0dd741c3f4599386393ed45be2f1efd09
MD5 218f945d389747258fdcad7e32f17702
BLAKE2b-256 029dd0086d637e675f6f7245c60f0be83863c98943062b86b4520c90e6205cbc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page