AI-Native Distributed Filesystem Architecture

These details have not been verified by PyPI

Project links

Project description

Nexus: AI-Native Distributed Filesystem

Version 0.1.0 | AI Agent Infrastructure Platform

Nexus is a complete AI agent infrastructure platform that combines distributed unified filesystem, self-evolving agent memory, intelligent document processing, and seamless deployment from local development to hosted production—all from a single codebase.

Features

Foundation

Distributed Unified Filesystem: Multi-backend abstraction (S3, GDrive, SharePoint, LocalFS)
Tiered Storage: Hot/Warm/Cold tiers with automatic lineage tracking
Content-Addressable Storage: 30-50% storage savings via deduplication
"Everything as a File" Paradigm: Configuration, memory, jobs, and commands as files

Agent Intelligence

Self-Evolving Memory: Agent memory with automatic consolidation
Memory Versioning: Track knowledge evolution over time
Multi-Agent Sharing: Shared memory spaces within tenants
Memory Analytics: Effectiveness tracking and insights
Prompt Version Control: Track prompt evolution with lineage
Training Data Management: Version-controlled datasets with deduplication
Prompt Optimization: Multi-candidate testing, execution traces, tradeoff analysis
Experiment Tracking: Organize optimization runs, per-example results, regression detection

Content Processing

Rich Format Parsing: Extensible parsers (PDF, Excel, CSV, JSON, images)
LLM KV Cache Management: 50-90% cost savings on AI queries
Semantic Chunking: Better search via intelligent document segmentation
MCP Integration: Native Model Context Protocol server
Document Type Detection: Automatic routing to appropriate parsers

Operations

Resumable Jobs: Checkpointing system survives restarts
OAuth Token Management: Auto-refreshing credentials
Backend Auto-Mount: Automatic recognition and mounting
Resource Management: CPU throttling and rate limiting
Work Queue Detection: SQL views for efficient task scheduling and dependency resolution

Deployment Modes

Nexus supports two deployment modes from a single codebase:

Mode	Use Case	Setup Time	Scaling
Local	Individual developers, CLI tools, prototyping	60 seconds	Single machine (~10GB)
Hosted	Teams and production (auto-scales)	Sign up	Automatic (GB to Petabytes)

Note: Hosted mode automatically scales infrastructure under the hood—you don't choose between "monolithic" or "distributed". Nexus handles that for you based on your usage.

Quick Start: Local Mode

import nexus

# Zero-deployment filesystem with AI features
# Config auto-discovered from nexus.yaml or environment
nx = nexus.connect()

async with nx:
    # Write and read files
    await nx.write("/workspace/data.txt", b"Hello World")
    content = await nx.read("/workspace/data.txt")

    # Semantic search across documents
    results = await nx.semantic_search(
        "/docs/**/*.pdf",
        query="authentication implementation"
    )

    # LLM-powered document reading with KV cache
    answer = await nx.llm_read(
        "/reports/q4.pdf",
        prompt="Summarize key findings",
        model="claude-sonnet-4"
    )

Config file (nexus.yaml):

mode: local
data_dir: ./nexus-data
cache_size_mb: 100
enable_vector_search: true

Quick Start: Hosted Mode

Coming Soon! Sign up for early access at nexus.ai

import nexus

# Connect to Nexus hosted instance
# Infrastructure scales automatically based on your usage
nx = nexus.connect(
    api_key="your-api-key",
    endpoint="https://api.nexus.ai"
)

async with nx:
    # Same API as local mode!
    await nx.write("/workspace/data.txt", b"Hello World")
    content = await nx.read("/workspace/data.txt")

For self-hosted deployments, see the S3-Compatible HTTP Server section below for deployment instructions.

Storage Backends

Nexus supports multiple storage backends through a unified API. All backends use Content-Addressable Storage (CAS) for automatic deduplication.

Local Backend (Default)

Store files on local filesystem:

import nexus

# Auto-detected from config or uses default
nx = nexus.connect()

# Or explicitly configure
nx = nexus.connect(config={
    "backend": "local",
    "data_dir": "./nexus-data"
})

Google Cloud Storage (GCS) Backend

Store files in Google Cloud Storage with local metadata:

import nexus

# Connect with GCS backend
nx = nexus.connect(config={
    "backend": "gcs",
    "gcs_bucket_name": "my-nexus-bucket",
    "gcs_project_id": "my-gcp-project",  # Optional
    "gcs_credentials_path": "/path/to/credentials.json",  # Optional
})

Authentication Methods:

Service Account Key: Provide gcs_credentials_path
Application Default Credentials (if not provided):
- GOOGLE_APPLICATION_CREDENTIALS environment variable
- gcloud auth application-default login credentials
- GCE/Cloud Run service account (when running on GCP)

Using Config File (nexus.yaml):

backend: gcs
gcs_bucket_name: my-nexus-bucket
gcs_project_id: my-gcp-project  # Optional
# gcs_credentials_path: /path/to/credentials.json  # Optional

Using Environment Variables:

export NEXUS_BACKEND=gcs
export NEXUS_GCS_BUCKET_NAME=my-nexus-bucket
export NEXUS_GCS_PROJECT_ID=my-gcp-project  # Optional
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json  # Optional

CLI Usage with GCS:

# Write file to GCS
nexus write /workspace/data.txt "Hello GCS!" \
  --backend=gcs \
  --gcs-bucket=my-nexus-bucket

# Or use config file (simpler!)
nexus write /workspace/data.txt "Hello GCS!" --config=nexus.yaml

Advanced: Direct Backend API

For advanced use cases, instantiate backends directly:

from nexus import NexusFS, LocalBackend, GCSBackend

# Local backend
nx_local = NexusFS(
    backend=LocalBackend("/path/to/data"),
    db_path="./metadata.db"
)

# GCS backend
nx_gcs = NexusFS(
    backend=GCSBackend(
        bucket_name="my-bucket",
        project_id="my-project",
        credentials_path="/path/to/creds.json"
    ),
    db_path="./gcs-metadata.db"
)

# Same API for both!
nx_local.write("/file.txt", b"data")
nx_gcs.write("/file.txt", b"data")

Backend Comparison

Feature	Local Backend	GCS Backend
Content Storage	Local filesystem	Google Cloud Storage
Metadata Storage	Local SQLite	Local SQLite
Deduplication	✅ CAS (30-50% savings)	✅ CAS (30-50% savings)
Multi-machine Access	❌ Single machine	✅ Shared across machines
Durability	Single disk	99.999999999% (11 nines)
Latency	<1ms (local)	10-50ms (network)
Cost	Free (local disk)	GCS storage pricing
Use Case	Development, single machine	Teams, production, backup

Coming Soon

Amazon S3 Backend (v0.7.0)
Azure Blob Storage (v0.7.0)
Google Drive (v0.7.0)
SharePoint (v0.7.0)

Installation

Using pip (Recommended)

# Install from PyPI
pip install nexus-ai-fs

# Verify installation
nexus --version

From Source (Development)

# Clone the repository
git clone https://github.com/nexi-lab/nexus.git
cd nexus

# Install using uv (recommended for faster installs)
curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -e ".[dev]"

# Or using pip
pip install -e ".[dev]"

Development Setup

# Install development dependencies
uv pip install -e ".[dev,test]"

# Run tests
pytest

# Run type checking
mypy src/nexus

# Format code
ruff format .

# Lint
ruff check .

CLI Usage

Nexus provides a beautiful command-line interface for all file operations. After installation, the nexus command will be available.

Quick Start

# Initialize a new workspace
nexus init ./my-workspace

# Write a file
nexus write /workspace/hello.txt "Hello, Nexus!"

# Read a file
nexus cat /workspace/hello.txt

# List files
nexus ls /workspace
nexus ls /workspace --recursive
nexus ls /workspace --long  # Detailed view with metadata

Available Commands

File Operations

# Write content to a file
nexus write /path/to/file.txt "content"
echo "content" | nexus write /path/to/file.txt --input -

# Display file contents (with syntax highlighting)
nexus cat /workspace/code.py

# Copy files
nexus cp /source.txt /dest.txt

# Delete files
nexus rm /workspace/old-file.txt
nexus rm /workspace/old-file.txt --force  # Skip confirmation

# Show file information
nexus info /workspace/data.txt

Directory Operations

# Create directory
nexus mkdir /workspace/data
nexus mkdir /workspace/deep/nested/dir --parents

# Remove directory
nexus rmdir /workspace/data
nexus rmdir /workspace/data --recursive --force

File Discovery

# List files
nexus ls /workspace
nexus ls /workspace --recursive
nexus ls /workspace --long  # Show size, modified time, etag

# Find files by pattern (glob)
nexus glob "**/*.py"  # All Python files recursively
nexus glob "*.txt" --path /workspace  # Text files in workspace
nexus glob "test_*.py"  # Test files

# Search file contents (grep)
nexus grep "TODO"  # Find all TODO comments
nexus grep "def \w+" --file-pattern "**/*.py"  # Find function definitions
nexus grep "error" --ignore-case  # Case-insensitive search
nexus grep "TODO" --max-results 50  # Limit results

# Search modes (v0.2.0+)
nexus grep "revenue" --file-pattern "**/*.pdf"  # Auto mode: tries parsed first
nexus grep "revenue" --file-pattern "**/*.pdf" --search-mode=parsed  # Only parsed content
nexus grep "TODO" --search-mode=raw  # Only raw text (skip parsing)

# Result shows source type
# Match: TODO (parsed) ← from parsed PDF
# Match: TODO (raw) ← from source code

Work Queue Operations

# Query work items by status
nexus work ready --limit 10  # Get ready work items (high priority first)
nexus work pending  # Get pending work items
nexus work blocked  # Get blocked work items (with dependency info)
nexus work in-progress  # Get currently processing items

# View aggregate statistics
nexus work status  # Show counts for all work queues

# Output as JSON (for scripting)
nexus work ready --json
nexus work status --json

Note: Work items are files with special metadata (status, priority, depends_on, worker_id). See docs/SQL_VIEWS_FOR_WORK_DETECTION.md for details on setting up work queues.

Examples

Initialize and populate a workspace:

# Create workspace
nexus init ./my-project

# Create structure
nexus mkdir /workspace/src --data-dir ./my-project/nexus-data
nexus mkdir /workspace/tests --data-dir ./my-project/nexus-data

# Add files
echo "print('Hello World')" | nexus write /workspace/src/main.py --input - \
  --data-dir ./my-project/nexus-data

# List everything
nexus ls / --recursive --long --data-dir ./my-project/nexus-data

Find and analyze code:

# Find all Python files
nexus glob "**/*.py"

# Search for TODO comments
nexus grep "TODO|FIXME" --file-pattern "**/*.py"

# Find all test files
nexus glob "**/test_*.py"

# Search for function definitions
nexus grep "^def \w+\(" --file-pattern "**/*.py"

Work with data:

# Write JSON data
echo '{"name": "test", "value": 42}' | nexus write /data/config.json --input -

# Display with syntax highlighting
nexus cat /data/config.json

# Get file information
nexus info /data/config.json

Global Options

All commands support these global options:

# Use custom config file
nexus ls /workspace --config /path/to/config.yaml

# Override data directory
nexus ls /workspace --data-dir /path/to/nexus-data

# Combine both (config takes precedence)
nexus ls /workspace --config ./my-config.yaml --data-dir ./data

Help

Get help for any command:

nexus --help  # Show all commands
nexus ls --help  # Show help for ls command
nexus grep --help  # Show help for grep command

Remote Nexus Server

Nexus includes a JSON-RPC server that exposes the full NexusFileSystem interface over HTTP, enabling remote filesystem access and FUSE mounts to remote servers.

Quick Start

# Start the server (optional API key authentication)
nexus serve --host 0.0.0.0 --port 8080 --api-key mysecret

# Use remote filesystem from Python
from nexus import RemoteNexusFS

nx = RemoteNexusFS(
    server_url="http://localhost:8080",
    api_key="mysecret"  # Optional
)

# Same API as local NexusFS!
nx.write("/workspace/hello.txt", b"Hello Remote!")
content = nx.read("/workspace/hello.txt")
files = nx.list("/workspace", recursive=True)

Features

Full NFS Interface: All filesystem operations exposed over RPC (read, write, list, glob, grep, mkdir, etc.)
JSON-RPC 2.0 Protocol: Standard RPC protocol with proper error handling
API Key Authentication: Optional Bearer token authentication for security
Backend Agnostic: Works with local and GCS backends
FUSE Compatible: Mount remote Nexus servers as local filesystems

Remote Client Usage

from nexus import RemoteNexusFS

# Connect to remote server
nx = RemoteNexusFS(
    server_url="http://your-server:8080",
    api_key="your-api-key"  # Optional
)

# All standard operations work
nx.write("/workspace/data.txt", b"content")
content = nx.read("/workspace/data.txt")
files = nx.list("/workspace", recursive=True)
results = nx.glob("**/*.py")
matches = nx.grep("TODO", file_pattern="*.py")

Server Options

# Start with custom host/port
nexus serve --host 0.0.0.0 --port 8080

# Start with API key authentication
nexus serve --api-key mysecret

# Start with GCS backend
nexus serve --backend=gcs --gcs-bucket=my-bucket --api-key mysecret

# Custom data directory
nexus serve --data-dir /path/to/data

Deploying Nexus Server

For production use, deploy Nexus to a VM with persistent storage for metadata:

Example Docker Deployment:

# Build Docker image
cd /path/to/nexus
docker build -t nexus-server:latest .

# Run server with GCS backend
docker run -d \
  --name nexus-server \
  --restart unless-stopped \
  -p 8080:8080 \
  -v /var/lib/nexus:/app/data \
  -e NEXUS_API_KEY="your-api-key" \
  -e NEXUS_BACKEND=gcs \
  -e NEXUS_GCS_BUCKET="your-bucket-name" \
  -e NEXUS_GCS_PROJECT="YOUR-PROJECT-ID" \
  -e NEXUS_DB_PATH="/app/data/nexus-metadata.db" \
  nexus-server:latest

Deployment Features:

Persistent Metadata: SQLite database stored on VM disk at /var/lib/nexus/
Content Storage: All file content stored in configured backend (GCS, local, etc.)
Content Deduplication: CAS-based storage with 30-50% savings
Full NFS API: All operations available remotely

FUSE Mount: Use Standard Unix Tools (v0.2.0)

Mount Nexus to a local path and use any standard Unix tool seamlessly - ls, cat, grep, vim, and more!

Installation

First, install FUSE support:

# Install Nexus with FUSE support
pip install nexus-ai-fs[fuse]

# Platform-specific FUSE library:
# macOS: Install macFUSE from https://osxfuse.github.io/
# Linux: sudo apt-get install fuse3  # or equivalent for your distro

Quick Start

# Mount Nexus to local path (smart mode by default)
nexus mount /mnt/nexus

# Now use ANY standard Unix tools!
ls -la /mnt/nexus/workspace/
cat /mnt/nexus/workspace/notes.txt
grep -r "TODO" /mnt/nexus/workspace/
find /mnt/nexus -name "*.py"
vim /mnt/nexus/workspace/code.py
git clone /some/repo /mnt/nexus/repos/myproject

# Unmount when done
nexus unmount /mnt/nexus

Quick Start Examples

Example 1: Default (Explicit Views) - Best for Mixed Workflows

# Mount normally
nexus mount /mnt/nexus

# Binary tools work directly
evince /mnt/nexus/docs/report.pdf     # PDF viewer works ✓

# Add .txt for text operations
cat /mnt/nexus/docs/report.pdf.txt    # Read as text
grep "results" /mnt/nexus/docs/*.pdf.txt

# Virtual views auto-generated
ls /mnt/nexus/docs/
# → report.pdf
# → report.pdf.txt  (virtual)
# → report.pdf.md   (virtual)

Example 2: Auto-Parse - Best for Search-Heavy Workflows

# Mount with auto-parse
nexus mount /mnt/nexus --auto-parse

# grep works directly on PDFs!
grep "results" /mnt/nexus/docs/*.pdf      # No .txt needed! ✓
cat /mnt/nexus/docs/report.pdf            # Returns text ✓

# Search across everything
grep -r "TODO" /mnt/nexus/workspace/      # Searches PDFs, Excel, etc.

# Binary via .raw/ when needed
evince /mnt/nexus/.raw/docs/report.pdf   # For PDF viewer

Example 3: Real-World Script

#!/bin/bash
# Find all PDFs mentioning "invoice"

nexus mount /mnt/nexus --auto-parse --daemon

# Now grep works on PDFs!
grep -l "invoice" /mnt/nexus/documents/*.pdf

# Process results
for pdf in $(grep -l "invoice" /mnt/nexus/documents/*.pdf); do
    echo "Found in: $pdf"
    grep -n "invoice" "$pdf" | head -5
done

nexus unmount /mnt/nexus

File Access: Two Modes

Nexus supports two ways to access files - choose what fits your workflow:

1. Explicit Views (Default) - Best for Compatibility

Binary files return binary, use .txt/.md suffixes for parsed content:

nexus mount /mnt/nexus

# Binary files work with native tools
evince /mnt/nexus/docs/report.pdf      # PDF viewer gets binary ✓
libreoffice /mnt/nexus/data/sheet.xlsx # Excel app gets binary ✓

# Add .txt to search/read as text
cat /mnt/nexus/docs/report.pdf.txt     # Returns parsed text
grep "pattern" /mnt/nexus/docs/*.pdf.txt

# Virtual views appear automatically
ls /mnt/nexus/docs/
# → report.pdf
# → report.pdf.txt  (virtual view)
# → report.pdf.md   (virtual view)

When to use: You want both binary tools AND text search to work

2. Auto-Parse Mode - Best for Search/Grep

Binary files return parsed text directly, use .raw/ for binary:

nexus mount /mnt/nexus --auto-parse

# Binary files return text directly - perfect for grep!
cat /mnt/nexus/docs/report.pdf         # Returns parsed text ✓
grep "pattern" /mnt/nexus/docs/*.pdf   # Works directly! ✓
less /mnt/nexus/docs/report.pdf        # Page through text ✓

# Access binary via .raw/ when needed
evince /mnt/nexus/.raw/docs/report.pdf # PDF viewer gets binary

# No .txt/.md suffixes - files return text by default
ls /mnt/nexus/docs/
# → report.pdf  (returns text when read)

When to use: Text search is your primary use case, binary tools are secondary

Mount Modes (Content Parsing)

Control what gets parsed:

# Smart mode (default) - Auto-detect file types
nexus mount /mnt/nexus --mode=smart
# ✅ PDFs, Excel, Word → parsed
# ✅ .py, .txt, .md → pass-through
# ✅ Best for mixed content

# Text mode - Parse everything aggressively
nexus mount /mnt/nexus --mode=text
# ✅ All files parsed to text
# ⚠️  Slower (always parses)

# Binary mode - No parsing at all
nexus mount /mnt/nexus --mode=binary
# ✅ All files return binary
# ❌ grep won't work on PDFs

Comparison Table

Feature	Explicit Views (default)	Auto-Parse Mode (`--auto-parse`)
PDF viewers work	✅ `evince file.pdf`	⚠️ `evince .raw/file.pdf`
grep on PDFs	⚠️ `grep *.pdf.txt`	✅ `grep *.pdf`
Excel apps work	✅ `libreoffice file.xlsx`	⚠️ `libreoffice .raw/file.xlsx`
Best for	Binary tools + search	Text search primary use case
Virtual views	`.txt`, `.md` suffixes	No suffixes needed
Binary access	Direct (`file.pdf`)	Via `.raw/` directory

Background (Daemon) Mode

Run the mount in the background so you can close your terminal:

# Mount in background
nexus mount /mnt/nexus --daemon

# Do your work...
ls /mnt/nexus
cat /mnt/nexus/workspace/file.txt

# Later, unmount when done
nexus unmount /mnt/nexus

Performance & Caching (v0.2.0)

FUSE mounts include automatic caching for improved performance. Caching is enabled by default with sensible defaults - no configuration needed for most users.

Default Performance:

✅ Attribute caching (1024 entries, 60s TTL) - Makes ls and stat operations faster
✅ Content caching (100 files) - Speeds up repeated file reads
✅ Parsed content caching (50 files) - Accelerates PDF/Excel text extraction
✅ Automatic cache invalidation on writes/deletes - Always consistent

Advanced: Custom Cache Configuration

For power users with specific performance requirements:

from nexus import connect
from nexus.fuse import mount_nexus

nx = connect(config={"data_dir": "./nexus-data"})

# Custom cache configuration
cache_config = {
    "attr_cache_size": 2048,      # Double the attribute cache (default: 1024)
    "attr_cache_ttl": 120,         # Cache attributes for 2 minutes (default: 60s)
    "content_cache_size": 200,     # Cache 200 files (default: 100)
    "parsed_cache_size": 100,      # Cache 100 parsed files (default: 50)
    "enable_metrics": True         # Track cache hit/miss rates (default: False)
}

fuse = mount_nexus(
    nx,
    "/mnt/nexus",
    mode="smart",
    cache_config=cache_config,
    foreground=False
)

# View cache performance (if metrics enabled)
# Note: Access via fuse.fuse.operations.cache

Cache Configuration Options:

Option	Default	Description
`attr_cache_size`	1024	Max number of cached file attribute entries
`attr_cache_ttl`	60	Time-to-live for attributes in seconds
`content_cache_size`	100	Max number of cached file contents
`parsed_cache_size`	50	Max number of cached parsed contents (PDFs, etc.)
`enable_metrics`	False	Enable cache hit/miss tracking

When to Tune Cache Settings:

Large directory listings: Increase attr_cache_size to 2048+ and attr_cache_ttl to 120+
Many small files: Increase content_cache_size to 500+
Heavy PDF/Excel use: Increase parsed_cache_size to 200+
Performance analysis: Enable enable_metrics to measure cache effectiveness
Memory-constrained: Decrease all cache sizes (e.g., 512 / 50 / 25)

Notes:

Caches are thread-safe - safe for concurrent access
Caches are automatically invalidated on file writes, deletes, and renames
Default settings work well for most use cases - tune only if needed

rclone-style CLI Commands (v0.2.0)

Nexus provides efficient file operations inspired by rclone, with automatic deduplication and progress tracking:

Sync Command

One-way synchronization with hash-based change detection:

# Sync local directory to Nexus (only copies changed files)
nexus sync ./local/dataset/ /workspace/training/

# Preview changes before syncing (dry-run)
nexus sync ./data/ /workspace/backup/ --dry-run

# Mirror sync - delete extra files in destination
nexus sync /workspace/source/ /workspace/dest/ --delete

# Disable hash comparison (force copy all files)
nexus sync ./data/ /workspace/ --no-checksum

Copy Command

Smart copy with automatic deduplication:

# Copy directory recursively (skips identical files)
nexus copy ./local/data/ /workspace/project/ --recursive

# Copy within Nexus (leverages CAS deduplication)
nexus copy /workspace/source/ /workspace/dest/ --recursive

# Copy Nexus to local
nexus copy /workspace/data/ ./backup/ --recursive

# Copy single file
nexus copy /workspace/file.txt /workspace/copy.txt

# Disable checksum verification
nexus copy ./data/ /workspace/ --recursive --no-checksum

Move Command

Efficient file/directory moves with confirmation prompts:

# Move file (rename if possible, copy+delete otherwise)
nexus move /workspace/old.txt /workspace/new.txt

# Move directory without confirmation
nexus move /workspace/old_dir/ /archives/2024/ --force

Tree Command

Visualize directory structure as ASCII tree:

# Show full directory tree
nexus tree /workspace/

# Limit depth to 2 levels
nexus tree /workspace/ -L 2

# Show file sizes
nexus tree /workspace/ --show-size

Size Command

Calculate directory sizes with human-readable output:

# Calculate total size
nexus size /workspace/project/

# Human-readable output (KB, MB, GB)
nexus size /workspace/ --human

# Show top 10 largest files
nexus size /workspace/ --human --details

Features:

Hash-based deduplication - Only copies changed files
Progress bars - Visual feedback for long operations
Dry-run mode - Preview changes before execution
Cross-platform paths - Works with local filesystem and Nexus paths
Automatic deduplication - Leverages Content-Addressable Storage (CAS)

Performance Comparison

Method	Speed	Content-Aware	Use Case
`grep -r /mnt/nexus/`	Medium	✅ Yes (via mount)	Interactive use
`nexus grep "pattern"`	Fast (DB-backed)	✅ Yes	Large-scale search
Standard tools	Familiar	✅ Yes (via mount)	Day-to-day work

Use Cases

Interactive Development:

# Mount for interactive work
nexus mount /mnt/nexus
vim /mnt/nexus/workspace/code.py
git clone /mnt/nexus/repos/myproject

Bulk Operations:

# Use rclone-style commands for efficiency
nexus sync /local/dataset/ /workspace/training-data/
nexus tree /workspace/ > structure.txt

Automated Workflows:

# Standard Unix tools in scripts
find /mnt/nexus -name "*.pdf" -exec grep -l "invoice" {} \;
rsync -av /mnt/nexus/workspace/ /backup/

Architecture

Agent Workspace Structure

Every agent gets a structured workspace at /workspace/{tenant}/{agent}/:

/workspace/acme-corp/research-agent/
├── .nexus/                          # Nexus metadata (Git-trackable)
│   ├── agent.yaml                   # Agent configuration
│   ├── commands/                    # Custom commands (markdown files)
│   │   ├── analyze-codebase.md
│   │   └── summarize-docs.md
│   ├── jobs/                        # Background job definitions
│   │   └── daily-summary.yaml
│   ├── memory/                      # File-based memory
│   │   ├── project-knowledge.md
│   │   └── recent-tasks.jsonl
│   └── secrets.encrypted            # KMS-encrypted credentials
├── data/                            # Agent's working data
│   ├── inputs/
│   └── outputs/
└── INSTRUCTIONS.md                  # Agent instructions (auto-loaded)

Path Namespace

/
├── workspace/        # Agent scratch space (hot tier, ephemeral)
├── shared/           # Shared tenant data (warm tier, persistent)
├── external/         # Pass-through backends (no content storage)
├── system/           # System metadata (admin-only)
└── archives/         # Cold storage (read-only)

Core Components

File System Operations

import nexus

# Works in both local and hosted modes
# Mode determined by config file or environment
nx = nexus.connect()

async with nx:
    # Basic operations
    await nx.write("/workspace/data.txt", b"content")
    content = await nx.read("/workspace/data.txt")
    await nx.delete("/workspace/data.txt")

    # Batch operations
    files = await nx.list("/workspace/", recursive=True)
    results = await nx.copy_batch(sources, destinations)

    # File discovery
    python_files = await nx.glob("**/*.py")
    todos = await nx.grep(r"TODO:|FIXME:", file_pattern="*.py")

Semantic Search

# Search across documents with vector embeddings
async with nexus.connect() as nx:
    results = await nx.semantic_search(
        path="/docs/",
        query="How does authentication work?",
        limit=10,
        filters={"file_type": "markdown"}
    )

    for result in results:
        print(f"{result.path}:{result.line} - {result.text}")

LLM-Powered Reading

# Read documents with AI, with automatic KV cache
async with nexus.connect() as nx:
    answer = await nx.llm_read(
        path="/reports/q4-2024.pdf",
        prompt="What were the top 3 challenges?",
        model="claude-sonnet-4",
        max_tokens=1000
    )

Agent Memory

# Store and retrieve agent memories
async with nexus.connect() as nx:
    await nx.store_memory(
        content="User prefers TypeScript over JavaScript",
        memory_type="preference",
        tags=["coding", "languages"]
    )

    memories = await nx.search_memories(
        query="programming language preferences",
        limit=5
    )

Prompt Optimization (Coming in v0.9.5)

# Track multiple prompt candidates during optimization
async with nexus.connect() as nx:
    # Start optimization run
    run_id = await nx.start_optimization_run(
        module_name="SearchModule",
        objectives=["accuracy", "latency", "cost"]
    )

    # Store prompt candidates with detailed traces
    for candidate in prompt_variants:
        version_id = await nx.store_prompt_version(
            module_name="SearchModule",
            prompt_template=candidate.template,
            metrics={"accuracy": 0.85, "latency_ms": 450},
            run_id=run_id
        )

        # Store execution traces for debugging
        await nx.store_execution_trace(
            prompt_version_id=version_id,
            inputs=test_inputs,
            outputs=predictions,
            intermediate_steps=reasoning_chain
        )

    # Analyze tradeoffs across candidates
    analysis = await nx.analyze_prompt_tradeoffs(
        run_id=run_id,
        objectives=["accuracy", "latency_ms", "cost_per_query"]
    )

    # Get per-example results to find failure patterns
    failures = await nx.get_failing_examples(
        prompt_version_id=version_id,
        limit=20
    )

Custom Commands

Create /workspace/{tenant}/{agent}/.nexus/commands/semantic-search.md:

---
name: semantic-search
description: Search codebase semantically
allowed-tools: [semantic_read, glob, grep]
required-scopes: [read]
model: sonnet
---

## Your task

Given query: {{query}}

1. Use `glob` to find relevant files by pattern
2. Use `semantic_read` to extract relevant sections
3. Summarize findings with file:line citations

Execute via API:

async with nexus.connect() as nx:
    result = await nx.execute_command(
        "semantic-search",
        context={"query": "authentication implementation"}
    )

Technology Stack

Core

Language: Python 3.11+
API Framework: FastAPI
Database: PostgreSQL (prod) / SQLite (dev)
Cache: Redis (prod) / In-memory (dev)
Vector DB: Qdrant
Object Storage: S3-compatible, GCS, Azure Blob

AI/ML

LLM Providers: Anthropic Claude, OpenAI, Google Gemini
Embeddings: text-embedding-3-large, voyage-ai
Parsing: PyPDF2, pandas, openpyxl, Pillow

Infrastructure

Orchestration: Kubernetes (distributed mode)
Monitoring: Prometheus + Grafana
Logging: Structlog + Loki
Admin UI: Simple HTML/JS (jobs, memories, files, operations)

Performance Targets

Metric	Target	Impact
Write Throughput	500-1000 MB/s	10-50× vs direct backend
Read Latency	<10ms	10-50× vs remote storage
Memory Search	<100ms	Vector search across memories
Storage Savings	30-50%	CAS deduplication
Job Resumability	100%	Survives all restarts
LLM Cache Hit Rate	50-90%	Major cost savings
Prompt Versioning	Full lineage	Track optimization history
Training Data Dedup	30-50%	CAS-based deduplication
Prompt Optimization	Multi-candidate	Test multiple strategies in parallel
Trace Storage	Full execution logs	Debug failures, analyze patterns

Configuration

Local Mode

import nexus

# Config via Python (useful for programmatic configuration)
nx = nexus.connect(config={
    "mode": "local",
    "data_dir": "./nexus-data",
    "cache_size_mb": 100,
    "enable_vector_search": True
})

# Or let it auto-discover from nexus.yaml
nx = nexus.connect()

Self-Hosted Deployment

For organizations that want to run their own Nexus instance, create config.yaml:

mode: server  # local or server

database:
  url: postgresql://user:pass@localhost/nexus
  # or for SQLite: sqlite:///./nexus.db

cache:
  type: redis  # memory, redis
  url: redis://localhost:6379

vector_db:
  type: qdrant
  url: http://localhost:6333

backends:
  - type: s3
    bucket: my-company-files
    region: us-east-1

  - type: gdrive
    credentials_path: ./gdrive-creds.json

auth:
  jwt_secret: your-secret-key
  token_expiry_hours: 24

rate_limits:
  default: "100/minute"
  semantic_search: "10/minute"
  llm_read: "50/hour"

Run server:

nexus server --config config.yaml

Security

Multi-Layer Security Model

API Key Authentication: Tenant and agent identification
Row-Level Security (RLS): Database-level tenant isolation
Type-Level Validation: Fail-fast validation before database operations
UNIX-Style Permissions: Owner, group, and mode bits (coming in v0.2.0)
ACL Permissions: Fine-grained access control lists (coming in v0.2.0)

Type-Level Validation (NEW in v0.1.0)

All domain types have validation methods that are called automatically before database operations. This provides:

Fail Fast: Catch invalid data before expensive database operations
Clear Error Messages: Actionable feedback for developers and API consumers
Data Integrity: Prevent invalid data from entering the database
Consistent Validation: Same rules across all code paths

from nexus.core.metadata import FileMetadata
from nexus.core.exceptions import ValidationError

# Validation happens automatically on put()
try:
    metadata = FileMetadata(
        path="/data/file.txt",  # Must start with /
        backend_name="local",
        physical_path="/storage/file.txt",
        size=1024,  # Must be >= 0
    )
    store.put(metadata)  # Validates before DB operation
except ValidationError as e:
    print(f"Validation failed: {e}")
    # Example: "size cannot be negative, got -1"

Validation Rules:

Paths must start with / and not contain null bytes
File sizes and ref counts must be non-negative
Required fields (path, backend_name, physical_path, etc.) must not be empty
Content hashes must be valid 64-character SHA-256 hex strings
Metadata keys must be ≤ 255 characters

Example: Multi-Tenancy Isolation

-- RLS automatically filters queries by tenant
SET LOCAL app.current_tenant_id = '<tenant_uuid>';

-- All queries auto-filtered, even with bugs
SELECT * FROM file_paths WHERE path = '/data';
-- Returns only rows for current tenant

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=nexus --cov-report=html

# Run specific test file
pytest tests/test_filesystem.py

# Run integration tests
pytest tests/integration/ -v

# Run performance tests
pytest tests/performance/ --benchmark-only

Documentation

Contributing

We welcome contributions! Please see CONTRIBUTING.md for details.

# Fork the repo and clone
git clone https://github.com/yourusername/nexus.git
cd nexus

# Create a feature branch
git checkout -b feature/your-feature

# Make changes and test
uv pip install -e ".[dev,test]"
pytest

# Format and lint
ruff format .
ruff check .

# Commit and push
git commit -am "Add your feature"
git push origin feature/your-feature

License

Apache 2.0 License - see LICENSE for details.

Roadmap

v0.1.0 - Local Mode Foundation (Current)

Core embedded filesystem (read/write/delete)
SQLite metadata store
Local filesystem backend
Basic file operations (list, glob, grep)
Virtual path routing
Directory operations (mkdir, rmdir, is_directory)
Basic CLI interface with Click and Rich
Metadata export/import (JSONL format)
SQL views for ready work detection
In-memory caching
Batch operations (avoid N+1 queries)
Type-level validation

v0.2.0 - FUSE Mount & Content-Aware Operations (Current)

FUSE filesystem mount - Mount Nexus to local path (e.g., /mnt/nexus)
Smart read mode - Return parsed text for binary files (PDFs, Excel, etc.)
Virtual file views - Auto-generate .txt and .md views for binary files
Content parser framework - Extensible parser system for document types (MarkItDown)
PDF parser - Extract text and markdown from PDFs
Excel/CSV parser - Parse spreadsheets to structured data
Content-aware file access - Access parsed content via virtual views
Document type detection - Auto-detect MIME types and route to parsers
Mount CLI commands - nexus mount, nexus unmount
Mount modes - Binary, text, and smart modes
.raw directory - Access original binary files
Background daemon mode - Run mount in background with --daemon
All FUSE operations - read, write, create, delete, mkdir, rmdir, rename, truncate
Unit tests - Comprehensive test coverage for FUSE operations
rclone-style CLI commands - sync, copy, move, tree, size with progress bars
Background parsing - Async content parsing on write
FUSE performance optimizations - Caching (TTL/LRU), cache invalidation, metrics
Image OCR parser - Extract text from images (PNG, JPEG)

v0.3.0 - File Permissions & Skills System

UNIX-style file permissions (owner, group, mode)
Permission operations (chmod, chown, chgrp)
Default permission policies per namespace
Permission inheritance for new files
Permission checking in all file operations
ACL (Access Control List) support
ReBAC (Relationship-Based Access Control) - Zanzibar-style authorization
Relationship types - member-of, owner-of, parent-of, shared-with
Permission inheritance via relationships - Team ownership, group membership
Relationship graph queries - Transitive closure, path existence checks
Namespaced tuples - (subject, relation, object) authorization model
Check API - Fast permission checks with caching
Expand API - Discover all subjects with specific permissions
Relationship management - Create, delete, query relationships
Permission migration for existing files
Comprehensive permission tests
Skills System integration - Anthropic-compatible SKILL.md format
Skill discovery & loading - Progressive disclosure, lazy loading
Agent-specific skills - /workspace/{tenant}/{agent}/.nexus/skills/
Tenant-wide skill library - /shared/{tenant}/skills/
System skills - /system/skills/ (Anthropic official)
Skill templates - Pre-built templates for common patterns
Skill versioning - CAS-backed version control
Skill composition - Automatic dependency resolution
Skill analytics - Usage tracking, success rates
Skill marketplace - Org-wide skill catalog
Skill CLI commands - create, fork, publish, search
Semantic skill search - Find skills by description

v0.4.0 - AI Integration

LLM provider abstraction
Anthropic Claude integration
OpenAI integration
Basic KV cache for prompts
Semantic search (vector embeddings)
LLM-powered document reading

v0.5.0 - Agent Workspaces

Agent workspace structure
File-based configuration (.nexus/)
Custom command system (markdown)
Basic agent memory storage
Memory consolidation
Memory reflection phase (ACE-inspired: extract insights from execution trajectories)
Strategy/playbook organization (ACE-inspired: organize memories as reusable strategies)

v0.6.0 - Server Mode (Self-Hosted & Managed)

FastAPI REST API
API key authentication
Multi-tenancy support
PostgreSQL support
Redis caching
Docker deployment
Batch/transaction APIs (atomic multi-operation updates)
Optimistic locking for concurrent writes
Auto-scaling configuration (for hosted deployments)

v0.7.0 - Extended Features & Event System

S3 backend support
Google Drive backend
Job system with checkpointing
OAuth token management
MCP server implementation
Webhook/event system (file changes, memory updates, job events)
Watch API for real-time updates (streaming changes to clients)
Server-Sent Events (SSE) support for live monitoring
Simple admin UI (jobs, memories, files, operation logs)
Operation logs table (track storage operations for debugging)

v0.8.0 - Advanced AI Features & Rich Query

Advanced KV cache with context tracking
Memory versioning and lineage
Multi-agent memory sharing
Enhanced semantic search
Importance-based memory preservation (ACE-inspired: prevent brevity bias in consolidation)
Context-aware memory retrieval (include execution context in search)
Automated strategy extraction (LLM-powered extraction from successful trajectories)
Rich memory query language (filter by metadata, importance, task type, date ranges, etc.)
Memory query builder API (fluent interface for complex queries)
Combined vector + metadata search (hybrid search)

v0.9.0 - Production Readiness

Monitoring and observability
Performance optimization
Comprehensive testing
Security hardening
Documentation completion
Optional OpenTelemetry export (for framework integration)

v0.9.5 - Prompt Engineering & Optimization

Prompt version control with lineage tracking
Training dataset storage with CAS deduplication
Evaluation metrics time series (performance tracking)
Frozen inference snapshots (immutable program state)
Experiment tracking export (MLflow, W&B integration)
Prompt diff viewer (compare versions)
Regression detection alerts (performance drops)
Multi-candidate pool management (concurrent prompt testing)
Execution trace storage (detailed run logs for debugging)
Per-example evaluation results (granular performance tracking)
Optimization run grouping (experiment management)
Multi-objective tradeoff analysis (accuracy vs latency vs cost)

v0.10.0 - Production Infrastructure & Auto-Scaling

Automatic infrastructure scaling
Redis distributed locks (for large deployments)
PostgreSQL replication (for high availability)
Kubernetes deployment templates
Multi-region load balancing
Automatic migration from single-node to distributed

v1.0.0 - Production Release

Complete feature set
Production-tested
Comprehensive documentation
Migration tools
Enterprise support

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: support@nexus.example.com
Slack: Join our community

Built with ❤️ by the Nexus team

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.9.26

Apr 5, 2026

0.9.25

Apr 5, 2026

0.9.24

Apr 5, 2026

0.9.23

Apr 4, 2026

0.9.22

Apr 4, 2026

0.9.21

Apr 4, 2026

0.9.20

Apr 4, 2026

0.9.19

Apr 2, 2026

0.9.18

Apr 1, 2026

0.9.16

Mar 30, 2026

0.9.14

Mar 27, 2026

0.9.12

Mar 25, 2026

0.9.11

Mar 23, 2026

0.9.10

Mar 23, 2026

0.9.9

Mar 22, 2026

0.9.8

Mar 19, 2026

0.9.6

Mar 16, 2026

0.9.5

Mar 16, 2026

0.9.4

Mar 15, 2026

0.9.3

Mar 15, 2026

0.9.2

Mar 13, 2026

0.9.1

Mar 11, 2026

0.9.0

Mar 11, 2026

0.7.0

Feb 1, 2026

0.6.4

Dec 18, 2025

0.6.3

Dec 17, 2025

0.6.2

Dec 10, 2025

0.6.1

Dec 9, 2025

0.6.0

Dec 7, 2025

0.5.6

Nov 18, 2025

0.5.5

Nov 18, 2025

0.5.4

Nov 14, 2025

0.5.3

Nov 4, 2025

0.5.2

Oct 31, 2025

0.5.0

Oct 30, 2025

0.3.9

Oct 23, 2025

0.3.0

Oct 22, 2025

0.2.5

Oct 21, 2025

0.2.4

Oct 20, 2025

0.2.3

Oct 20, 2025

This version

0.2.2

Oct 19, 2025

0.1.3

Oct 17, 2025

0.1.2

Oct 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nexus_ai_fs-0.2.2.tar.gz (508.2 kB view details)

Uploaded Oct 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nexus_ai_fs-0.2.2-py3-none-any.whl (118.7 kB view details)

Uploaded Oct 19, 2025 Python 3

File details

Details for the file nexus_ai_fs-0.2.2.tar.gz.

File metadata

Download URL: nexus_ai_fs-0.2.2.tar.gz
Upload date: Oct 19, 2025
Size: 508.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nexus_ai_fs-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`6473d67463bbe527264a086c6f68dbdcd128f885170ccaf5bd66a08261042a1e`
MD5	`c5c6d5aae838d2e1e1d82ad019123402`
BLAKE2b-256	`3edb98d34fb2d7222baeab873eda9a0075af0e5332563d2eca6c63ed0087fef3`

See more details on using hashes here.

File details

Details for the file nexus_ai_fs-0.2.2-py3-none-any.whl.

File metadata

Download URL: nexus_ai_fs-0.2.2-py3-none-any.whl
Upload date: Oct 19, 2025
Size: 118.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nexus_ai_fs-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`36bd3bdeb09b44a99185a83adacef8c0dd741c3f4599386393ed45be2f1efd09`
MD5	`218f945d389747258fdcad7e32f17702`
BLAKE2b-256	`029dd0086d637e675f6f7245c60f0be83863c98943062b86b4520c90e6205cbc`

See more details on using hashes here.

nexus-ai-fs 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Nexus: AI-Native Distributed Filesystem

Features

Foundation

Agent Intelligence

Content Processing

Operations

Deployment Modes

Quick Start: Local Mode

Quick Start: Hosted Mode

Storage Backends

Local Backend (Default)

Google Cloud Storage (GCS) Backend

Advanced: Direct Backend API

Backend Comparison

Coming Soon

Installation

Using pip (Recommended)

From Source (Development)

Development Setup

CLI Usage

Quick Start

Available Commands

File Operations

Directory Operations

File Discovery

Work Queue Operations

Examples

Global Options

Help

Remote Nexus Server

Quick Start

Features

Remote Client Usage

Server Options

Deploying Nexus Server

FUSE Mount: Use Standard Unix Tools (v0.2.0)

Installation

Quick Start

Quick Start Examples

File Access: Two Modes

1. Explicit Views (Default) - Best for Compatibility

2. Auto-Parse Mode - Best for Search/Grep

Mount Modes (Content Parsing)

Comparison Table

Background (Daemon) Mode

Performance & Caching (v0.2.0)

rclone-style CLI Commands (v0.2.0)

Sync Command

Copy Command

Move Command

Tree Command

Size Command

Performance Comparison

Use Cases

Architecture

Agent Workspace Structure

Path Namespace

Core Components

File System Operations

Semantic Search

LLM-Powered Reading

Agent Memory

Prompt Optimization (Coming in v0.9.5)

Custom Commands

Technology Stack

Core

AI/ML

Infrastructure

Performance Targets

Configuration

Local Mode