Skip to main content

Context collector for AI — gathers project files into token-limited chunks

Project description

arachna

PyPI version Free Software Python 3.11+ Tests

Context collector for AI — gathers project files into token-limited chunks.

What I believe

I'm a solo developer building tools for myself. arachna is an indie project — not a startup, not a company, not a product for sale. Free software. AGPLv3.

I believe AI tools should be independent. Not tied to a specific editor, cloud provider, or way of working. arachna doesn't lock you in. It prepares your project for AI to understand. The rest is up to you.

  • Any editor. Vim, VS Code, Cursor, Emacs — arachna doesn't care where you write code
  • Any LLM. Local models, cloud APIs, web chats — the brain is your choice
  • Plain files. No databases, no daemons, no hidden state. Everything is transparent — you can cat, grep, diff the output
  • No telemetry. No tracking, no cloud sync, no phoning home. Your code stays on your machine
  • Zero dependencies. Just Python 3.11+ stdlib. pip install arachna, that's it
  • Free software, not just open source. AGPLv3 guarantees the four freedoms. No proprietary forks. What's the difference?

Contents

What arachna does

arachna collects your project files into files ready to be sent to an AI. It understands tokens (not lines) and splits output smartly so nothing gets cut in the middle.

Install

pip install arachna

Quick start

cd your-project
arachna --init
arachna --all

Creates arachna_context/ with .md files ready for AI.

Examples

Local model (Ollama)

arachna --profile code
cat arachna_context/chat-code.md | ollama run qwen2.5:32b

Cloud API (OpenAI)

arachna --profile code
# Then paste arachna_context/chat-code.md into chat.openai.com
# Or use the API:
curl https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{"messages": [{"role": "user", "content": "'"$(cat arachna_context/chat-code.md)"'"}]}'

Multiple profiles for different tasks

# Give code to the Programmer agent
arachna --profile code

# Give tests to the Tester agent
arachna --profile tests

# Give docs to the Auditor agent
arachna --profile docs

# Give git history for context
arachna --profile git

Incremental mode (only changed files)

arachna --profile code --incremental
# First run: collects everything
# Second run: skips unchanged files, creates nothing

Agent workflow with snapshots (10-50x token savings)

arachna --snapshot --profile code --name "baseline"
# ... AI makes changes to your project ...
arachna --diff --from baseline --profile code
# Sends only the diff, not the full project

Dry-run (preview without writing)

arachna --all --dry-run

Safety check

arachna --validate
# Checks config for errors, exits 1 if problems found

Commands

arachna --init              interactive setup
arachna --init --defaults   auto-detect everything
arachna --init --preset X   use specific preset
arachna --all               collect all profiles
arachna --profile code      collect one profile
arachna --all --dry-run     preview without writing
arachna --clean             remove collected files
arachna --list              show profiles
arachna --validate          check config for errors
arachna --doctor            run full diagnostic
arachna --install-hook      install post-commit git hook (optional)
arachna --snapshot          create snapshot (optionally --name, --profile)
arachna --snapshot delete X delete snapshot
arachna --diff              diff from HEAD (optionally --from, --profile, --format)
arachna --store stats       show store statistics
arachna --store gc          garbage collect unreferenced objects

Options

Option Description
--output-dir path where to write (default: arachna_context/)
--verbose show skipped files
--compress remove blank lines and trailing spaces
--incremental only files changed since last run
--format xml,json markdown (default), xml, or json
--merge append to existing output instead of replacing
--dry-run preview without writing files
--force force overwrite with --install-hook

Profiles

Profiles let you separate context by role — different context for different AI tasks.

Example .arachna.json for a Python project:

BЭКТИКИjson { "project_name": "MyProject", "profiles": { "code": { "split_mode": "by_file", "directories": ["src", "app"], "patterns": [".py"], "files": ["pyproject.toml", "requirements.txt"], "pre_commands": ["tree src app"], "max_tokens": 16000 }, "tests": { "split_mode": "by_file", "directories": ["tests"], "patterns": [".py"], "max_tokens": 16000 }, "docs": { "split_mode": "by_file", "files": ["README.md", "TODO.md", "CHANGELOG.md"], "max_tokens": 16000 }, "git": { "split_mode": "by_marker", "split_marker": "\n=== COMMIT:", "command": "git log --reverse --format='=== COMMIT: %h ===%nTITLE: %s%n%nMESSAGE:%n%b%n'", "max_tokens": 16000 } } }


## Split modes

- by_file: code and docs, each file stays intact (default)
- by_paragraph: logs, splits on blank lines
- by_marker: git history, splits on custom marker
- single: everything in one file, truncates if too big

## All config fields

- split_mode: by_file, by_paragraph, by_marker, or single
- split_marker: string for by_marker mode
- directories: folders to scan
- patterns: glob patterns like ["*.py"]
- files: specific files to include
- exclude_patterns: glob patterns to skip
- pre_commands: shell commands before collection
- post_commands: shell commands after collection
- command: use command output instead of files
- max_tokens: token limit per output file
- section_format: markdown, xml, or json
- compress: safe whitespace compression (blank lines, trailing spaces).
  Does not modify indentation
- include_binary: include binaries as base64 (true/false)
- binary_extensions: whitelist like [".png"]
- binary_max_mb: max binary file size in MB

## Output

Files go to arachna_context/ (configurable):

    arachna_context/
      .arachna_manifest.json
      chat-manifest.md          # summary of all files
      chat-code.md
      chat-tests.md
      chat-docs.md
      chat-git.md

When content exceeds max_tokens, files are numbered: chat-code_1.md,
chat-code_2.md...

## Manifest and cleanup

Every created file is tracked in .arachna_manifest.json. Running --all
again removes old files automatically. With --profile, only that
profile's files are cleaned.

## Incremental mode

With --incremental, arachna skips files unchanged since last run.
Uses .arachna_cache.json with mtime_ns + size + SHA256 hashes
(smart hybrid — fast path without hashing, SHA256 fallback for
false positives like git checkout).

## Watch — snapshots and diffs

Watch is a subsystem for incremental AI workflows. Instead of sending
full project context (50k+ tokens) every time, create a snapshot once,
then send only changes (diff) in subsequent iterations.

### How it works

    # Create a baseline snapshot
    arachna --snapshot --profile code --name "before-refactor"

    # AI or developer makes changes to the project
    # ...

    # See what changed (markdown diff)
    arachna --diff --from before-refactor --profile code

    # XML output for programmatic processing
    arachna --diff --from before-refactor --format xml

### Content-addressable store

Snapshots are stored in .arachna/store/ (never committed — auto-gitignored).
Files are deduplicated by SHA256 hash. Multiple snapshots share identical
content — only one copy stored.

    arachna --store stats
    # Store statistics:
    #   Snapshots: 5
    #   Objects: 127
    #   Total size: 2.3 MB
    #   Unique content: 1.1 MB (52% deduplication)

    arachna --store gc
    # Removed 15 objects (freed 2.3 MB)

### Diff format

Human-readable diff optimized for AI consumption:

    ### src/main.py

    REMOVED lines 45-47:
        total = 0
        for item in items:
            total += item.price

    ADDED lines 45:
        return sum(item.price for item in items)

### Managing snapshots

    # List all snapshots
    arachna --snapshot

    # Delete a snapshot (objects survive for other snapshots)
    arachna --snapshot delete before-refactor

## Safety

Commands in .arachna.json (pre_commands, post_commands, command) are validated
before execution. Unknown or dangerous commands are blocked. The command
allowlist is strictly read-only — no interpreters, no filesystem modification.
Use --dry-run to preview what will be executed.

## Doctor

arachna --doctor runs a full diagnostic — validates all profiles, checks
that directories and files exist, verifies .gitignore integration.

## Git hooks (optional)

If you prefer git-based workflow, arachna can integrate via post-commit hooks.
But it works fine without git.

    arachna --install-hook

Configure the command in .arachna.json:

```json
{
  "hook": {
    "post-commit": "arachna --all --incremental"
  }
}

Tokenizer

arachna uses a conservative estimate: 4 characters = 1 token. Works for any model with a 20-30% safety margin.

Built-in (default)

No dependencies. Always works. Set max_tokens below your model's context window:

  • 8192 window → max_tokens: 6000
  • 32768 window → max_tokens: 24000

Custom tokenizer

Add to your .arachna.json:

  "tokenizer": "my_module:count_tokens"

Your module must export count_tokens(text) -> int:

# my_tok.py
def count_tokens(text: str) -> int:
    return max(1, len(text) // 4)  # your logic here

Cloud models

For exact token counts, install tiktoken:

pip install tiktoken

  "tokenizer": "tiktoken:cl100k_base"    # GPT-4, DeepSeek
  "tokenizer": "tiktoken:o200k_base"     # GPT-4o

Local models

For HuggingFace tokenizers, install transformers:

pip install transformers

  "tokenizer": "transformers:Qwen/Qwen2.5-7B-Instruct"
  "tokenizer": "transformers:mistralai/Mistral-7B-Instruct-v0.3"
  "tokenizer": "transformers:google/gemma-7b"

Note: transformers is a heavy dependency. For most local models, the built-in estimate with safety margin is sufficient.

Supported project types

arachna --init auto-detects 17 project types:

Languages

  • Python: src/, app/, lib/, pkg/, scripts/, *.py, pyproject.toml
  • JavaScript/TypeScript: src/, app/, lib/, *.js, *.ts, package.json
  • C/C++: src/, include/, *.c, *.cpp, *.h, CMakeLists.txt
  • C#: *.cs, *.csproj, *.sln
  • Swift: Sources/, *.swift, Package.swift
  • Kotlin/Java: src/, *.kt, *.java, build.gradle, pom.xml
  • Ruby: lib/, app/, *.rb, Gemfile
  • PHP: src/, app/, public/, *.php, composer.json

Engines

  • Godot: *.gd, *.tscn, *.tres, project.godot
  • Unity: Assets/, *.cs, *.unity, *.prefab
  • Unreal Engine: Source/, Content/, *.cpp, *.h, *.cs, *.uproject, *.uplugin

Infrastructure

  • Docker: Dockerfile, docker-compose.yml
  • Terraform: *.tf, *.tfvars

Service

  • tests: tests/, test/
  • docs: docs/, README.md, TODO.md, CHANGELOG.md, Makefile
  • config: pyproject.toml, package.json, go.mod, Cargo.toml, requirements.txt
  • git: git log --reverse with commit history

Custom presets

Create presets.json in your project root to add or override presets:

{
  "my_game": {
    "dirs": ["game"],
    "patterns": ["*.lua"],
    "max_tokens": 8000,
    "split_mode": "by_file",
    "detect": ["game"]
  }
}

Use with: arachna --init --preset my_game

Links

License

arachna is free software licensed under GNU AGPLv3. This license guarantees the four essential freedoms: to run the program for any purpose, to study and modify it, to redistribute copies, and to distribute modified versions.

Why AGPLv3 and not MIT or Apache? Because permissive licenses allow proprietary forks. AGPLv3 ensures that derivative works — including software running as a network service — remain free. No proprietary forks. No closed modifications. What the community builds, the community keeps.

See LICENSE for the full legal text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arachna-1.6.0.tar.gz (61.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arachna-1.6.0-py3-none-any.whl (64.6 kB view details)

Uploaded Python 3

File details

Details for the file arachna-1.6.0.tar.gz.

File metadata

  • Download URL: arachna-1.6.0.tar.gz
  • Upload date:
  • Size: 61.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for arachna-1.6.0.tar.gz
Algorithm Hash digest
SHA256 e06880e549ade49a3a3f05ab0fcb6979787d5b2340113c4125a396eba402f79c
MD5 5e68bab5a255cc1dd06acbf69ec4a2d9
BLAKE2b-256 17c05928915212f14bacd29883c463e83a2c4b0ed85a3293bcc1c6856c9cc59c

See more details on using hashes here.

File details

Details for the file arachna-1.6.0-py3-none-any.whl.

File metadata

  • Download URL: arachna-1.6.0-py3-none-any.whl
  • Upload date:
  • Size: 64.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for arachna-1.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 73004fce6977fdf9909ecb7a651ea9755eeb29e767f01f6e92ab48f95c377921
MD5 24fc7e21bd711d44fd6e83ca0a6c43f6
BLAKE2b-256 9ca1c80ddec67e6831363b85ef64256d16d476108730e0d65b0e496967c6d1fa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page