Skip to main content

AI Archive is a format and a tool for bundling a directory structure and its files into a single, executable bash/zsh script. It's designed to make sending and receiving file collections in a chat-based or text-only environment—like interacting with an LLM—as simple as copying and pasting a single block of text.

Project description

aiar (AI Archive)

A simple LLM-friendly archive format and utility for creating self-extracting shell archives.

Inspired by the classic Unix shar (shell archive), aiar is a format and a tool for bundling a directory structure and its files into a single, executable bash/zsh script. It's designed to make sending and receiving file collections in a chat-based or text-only environment—like interacting with an LLM—as simple as copying and pasting a single block of text.

Purpose

The primary purpose of the aiar format is to package a project's files into a single text block for use with a Large Language Model. This allows an LLM to receive or transmit a collection of files within a text-only interface, bypassing the need for binary archive formats like .zip. (Yes, I once had an LLM try to send me a base 64 encoded zip file, I kid you not. And, no, it wasn’t a valid zip file.)

Key Features

  • Single File: The entire archive is one text file. Easy to copy, paste, and save.
  • LLM-Friendly: The format is simple for an LLM to generate or consume. Because the file content is never executed, the LLM doesn't need to worry about shell-escaping special characters.
  • Self-Contained: The extraction logic is bundled with the data. No external tools like zip or tar are needed to unpack it.

The aiar Format

An aiar script has two main parts, separated by an exit 0 command.

  1. The Unpacker Logic: A bash script that reads its own file, line by line. It looks for a unique separator line that denotes the start of a new file. This part is optional if you use the "aiar" tool and might even be desirable if you don't want to run code directly from an LLM.
  2. The exit 0 Guard: This command prevents the shell from ever trying to execute the data section below it.
  3. The Data Payload: The raw, unescaped contents of your files, each preceded by the unique separator line.

Supported Formats

The aiar tool supports multiple output formats:

  • bare (.aiar) - Data-only format without self-extraction script. Most LLM-friendly for creation.
  • bash (.sh) - Self-extracting Bash script (default). Compatible with Linux/macOS/WSL.
  • python (.py) - Self-extracting Python script. Requires Python 3.6+.
  • nodejs (.js) - Self-extracting Node.js script. Requires Node.js.
  • powershell (.ps1) - Self-extracting PowerShell script. Requires PowerShell 5.1+.

All formats support both text and binary files and can be extracted using the aiar extract command.

Example Archives

Below are examples of each format. Each demonstrates the same two-file archive structure.

Bare Format (.aiar)

The bare format is the simplest and most LLM-friendly. It contains only the data with a separator definition, no extraction logic. This format can be extracted using aiar extract archive.aiar.

Note for LLMs: To create a bare format archive:

  1. Start with SEPARATOR="..." (use a unique string not found in file contents)
  2. Mark each file with :t: for text or :b: for binary
  3. Use format: SEPARATOR:t:filepath followed by file content
  4. The contents of the binary blobs are base64 encoded.
SEPARATOR="++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:"

++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/README.md
# Example Project

This is a sample file in the bare aiar format.

++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/hello.txt
Hello from aiar!
This file can contain any text content.
Special characters like $, #, ', ", `, $(cmd) are preserved literally.

Bash Format (.sh)

The Bash format is a self-extracting shell script. Save as archive.sh and run with bash archive.sh.

#!/bin/bash

SEPARATOR="++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:"

writing=false
# Function to report errors and exit cleanly
handle_error() {
  echo "Error: $1" >&2
  exit 1
}

# Function to close the previous file descriptor and wait for bg processes
close_previous_fd() {
  if [ "$writing" = true ]; then
    exec 3>&-
    wait 2>/dev/null || true
  fi
  writing=false
}

while read -r line; do
  if [[ "$line" == "$SEPARATOR"* ]]; then
    close_previous_fd
    payload=$(echo "$line" | cut -d ':' -f 2-)
    type=$(echo "$payload" | cut -d ':' -f 1)
    filepath=$(echo "$payload" | cut -d ':' -f 2-)
    if [ -n "$filepath" ]; then
      echo "Creating: $filepath"
      mkdir -p "$(dirname "$filepath")" || handle_error "Cannot create directory for '$filepath'."
      if [ "$type" == "b" ]; then
        exec 3> >(base64 -d > "$filepath") || handle_error "Cannot start base64 process for '$filepath'."
        writing=true
      elif [ "$type" == "t" ]; then
        exec 3>"$filepath" || handle_error "Cannot open '$filepath' for writing."
        writing=true
      else
        handle_error "Invalid file type '$type' in separator."
      fi
    fi
  elif [ "$writing" = true ]; then
    echo "$line" >&3
  fi
done < "$0"

close_previous_fd
echo "Extraction complete."
exit 0

# --- DATA ---

++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/README.md
# Example Project

This file includes special characters: $PATH, #comment, 'quotes', "quotes", `backticks`, $(cmd)
All are preserved literally.

++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/hello.txt
Hello from aiar!
She said, "He's going to the store for $5."

Python Format (.py)

The Python format is a self-extracting Python script. Save as archive.py and run with python archive.py. Files are embedded as commented lines with # prefix.

import sys, os, re, base64
from pathlib import Path

SEPARATOR="++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:"
SEP = re.escape(SEPARATOR)

def _safe_dest(rel: str) -> Path:
    p = Path(rel)
    if p.is_absolute():
        raise ValueError(f"Absolute path not allowed: {rel}")
    dest = (Path(".") / p).resolve()
    if Path(".").resolve() not in (set(dest.parents) | {dest}):
        raise ValueError(f"Path escapes output root: {rel}")
    return dest

def extract_all():
    with open(__file__, "r", encoding="utf-8") as f:
        script_content = f.read()

    pat = re.compile(
        rf"^# ?{SEP}([tb]):([^\n]+)\n(.*?)(?=^# ?{SEP}[tb]:|\Z)",
        re.DOTALL | re.MULTILINE,)

    any_found = False
    for ftype, path, body in pat.findall(script_content):
        any_found = True
        path = path.strip()
        try:
            dest = _safe_dest(path)
        except ValueError as e:
            print(f"Warning: {e}. Skipping.")
            continue

        if dest.exists():
            print(f"Skipping already existing file: '{dest}'")
            continue

        print(f"Creating: {dest}")
        dest.parent.mkdir(parents=True, exist_ok=True)
        
        uncommented_body = re.sub(r"^# ?", "", body, flags=re.MULTILINE)
        
        if ftype == "t":
            with open(dest, "w", encoding="utf-8", newline="\n") as out:
                out.write(uncommented_body)
        else:  # binary
            with open(dest, "wb") as out:
                out.write(base64.b64decode(uncommented_body.strip().encode("ascii"), validate=False))

    if not any_found:
        print("Error: No payload sections found in data block.")
        sys.exit(1)

extract_all()
print("Extraction complete.")
sys.exit(0)

# ++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/README.md
# # Example Project
# 
# This file includes special characters: $PATH, #comment, 'quotes', "quotes", `backticks`, $(cmd)
# All are preserved literally.
# 
# ++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/hello.txt
# Hello from aiar!
# She said, "He's going to the store for $5."

Node.js Format (.js)

The Node.js format is a self-extracting Node.js script. Save as archive.js and run with node archive.js. Files are embedded as commented lines with // prefix.

#!/usr/bin/env node

const fs = require('fs');
const path = require('path');

function escapeRegex(str) {
    return str.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}

const SEPARATOR = "++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:";
const SEP = escapeRegex(SEPARATOR);

function safeDest(rel) {
    if (path.isAbsolute(rel)) {
        throw new Error(`Absolute path not allowed: ${rel}`);
    }
    const dest = path.resolve(process.cwd(), rel);
    if (!dest.startsWith(process.cwd())) {
        throw new Error(`Path escapes output root: ${rel}`);
    }
    return dest;
}

function extractAll() {
    const scriptContent = fs.readFileSync(__filename, 'utf8');

    const pat = new RegExp(
        `^// ?${SEP}([tb]):([^\\n]+)\\n(.*?)(?=(^// ?${SEP}[tb]:|\\Z))`,
        'gms'
    );

    const matches = [...scriptContent.matchAll(pat)];

    if (matches.length === 0) {
        console.error("Error: No payload sections found in data block.");
        process.exit(1);
    }
    
    for (const match of matches) {
        const [, ftype, relPath, body] = match;
        const cleanPath = relPath.trim();
        
        let dest;
        try {
            dest = safeDest(cleanPath);
        } catch (e) {
            console.warn(`Warning: ${e.message}. Skipping.`);
            continue;
        }

        if (fs.existsSync(dest)) {
            console.log(`Skipping already existing file: '${dest}'`);
            continue;
        }

        console.log(`Creating: ${dest}`);
        fs.mkdirSync(path.dirname(dest), { recursive: true });

        const uncommentedBody = body.replace(/^\/\/ ?/gm, '');

        if (ftype === 't') {
            fs.writeFileSync(dest, uncommentedBody, { encoding: 'utf8' });
        } else {
            const buffer = Buffer.from(uncommentedBody.trim(), 'base64');
            fs.writeFileSync(dest, buffer);
        }
    }
}

extractAll();
console.log("Extraction complete.");
process.exit(0);

// ++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/README.md
// # Example Project
// 
// This file includes special characters: $PATH, #comment, 'quotes', "quotes", `backticks`, $(cmd)
// All are preserved literally.
// 
// ++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/hello.txt
// Hello from aiar!
// She said, "He's going to the store for $5."

PowerShell Format (.ps1)

The PowerShell format is a self-extracting PowerShell script. Save as archive.ps1 and run with powershell -ExecutionPolicy Bypass -File archive.ps1. Files are embedded as commented lines with # prefix.

#Requires -Version 5.1

$SEPARATOR="++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:"

function Escape-Regex {
    param([string]$String)
    return [System.Text.RegularExpressions.Regex]::Escape($String)
}

function Safe-Dest {
    param([string]$RelativePath)
    if ([System.IO.Path]::IsPathRooted($RelativePath)) {
        throw "Absolute path not allowed: $RelativePath"
    }
    $resolvedPath = [System.IO.Path]::GetFullPath((Join-Path -Path $PWD.Path -ChildPath $RelativePath))
    if (-not $resolvedPath.StartsWith($PWD.Path)) {
        throw "Path escapes output root: $RelativePath"
    }
    return $resolvedPath
}

function Extract-All {
    $scriptPath = $PSCommandPath
    $scriptContent = Get-Content -Path $scriptPath -Raw
    $sep = Escape-Regex "$SEPARATOR"
    $pattern = "(?ms)^#\s?$sep([tb]):([^\n]+)\n(.*?)(?=(^#\s?$sep[tb]:|\Z))"
    $matches = [System.Text.RegularExpressions.Regex]::Matches($scriptContent, $pattern)

    if ($matches.Count -eq 0) {
        Write-Error "No payload sections found in data block."
        exit 1
    }

    foreach ($match in $matches) {
        $ftype = $match.Groups[1].Value
        $relPath = $match.Groups[2].Value.Trim()
        $body = $match.Groups[3].Value

        try {
            $dest = Safe-Dest -RelativePath $relPath
        } catch {
            Write-Warning "Warning: $_. Skipping."
            continue
        }

        if (Test-Path -LiteralPath $dest) {
            Write-Output "Skipping already existing file: '$dest'"
            continue
        }

        Write-Output "Creating: $dest"
        $null = New-Item -ItemType Directory -Force -Path (Split-Path -Path $dest -Parent)
        $uncommentedBody = $body -replace '(?m)^#\s?' , ''

        if ($ftype -eq 't') {
            Set-Content -Path $dest -Value $uncommentedBody -NoNewline -Encoding utf8
        } elseif ($ftype -eq 'b') {
            $cleanBase64String = $uncommentedBody -replace '\s'
            $bytes = [System.Convert]::FromBase64String($cleanBase64String)
            [System.IO.File]::WriteAllBytes($dest, $bytes)
        } else {
            Write-Warning "Unknown file type '$ftype' for '$relPath'. Skipping."
        }
    }
}

Extract-All
Write-Output "Extraction complete."
exit 0

# --- PAYLOAD ---
# ++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/README.md
# # Example Project
# 
# This file includes special characters: $PATH, #comment, 'quotes', "quotes", `backticks`, $(cmd)
# All are preserved literally.
# 
# ++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/hello.txt
# Hello from aiar!
# She said, "He's going to the store for $5."

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aiar-0.1.4.tar.gz (19.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aiar-0.1.4-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file aiar-0.1.4.tar.gz.

File metadata

  • Download URL: aiar-0.1.4.tar.gz
  • Upload date:
  • Size: 19.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for aiar-0.1.4.tar.gz
Algorithm Hash digest
SHA256 8de585bd251151dc8780173654f984cc52d8360234e7629833c2f11c4646b6c2
MD5 878b53594eb1f3953a480efcc2aaa1ef
BLAKE2b-256 aac3f3d83c27b186fbf30fd6f7eb785a3c9b32682e2aa0f700c04bfd9e4ad4fe

See more details on using hashes here.

Provenance

The following attestation bundles were made for aiar-0.1.4.tar.gz:

Publisher: publish.yml on owebeeone/aiar

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aiar-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: aiar-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for aiar-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 633a3daac938028d917d52c0987950c8292a3fd3779fa3c137d991d429c54f6c
MD5 1f44852a1344cf859b3f4dfd86a0ab68
BLAKE2b-256 f10764c97346cc7c80e1f8b89bee138f930b360052acf2a09e454a917ca33120

See more details on using hashes here.

Provenance

The following attestation bundles were made for aiar-0.1.4-py3-none-any.whl:

Publisher: publish.yml on owebeeone/aiar

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page