Skip to main content

AI Archive is a format and a tool for bundling a directory structure and its files into a single, executable bash/zsh script. It's designed to make sending and receiving file collections in a chat-based or text-only environment—like interacting with an LLM—as simple as copying and pasting a single block of text.

Project description

aiar (AI Archive)

PyPI version Python 3.9+ License: MIT GitHub

A simple LLM-friendly archive format and utility for creating self-extracting shell archives.

Inspired by the classic Unix shar (shell archive), aiar is a format and a tool for bundling a directory structure and its files into a single, executable bash/zsh script. It's designed to make sending and receiving file collections in a chat-based or text-only environment—like interacting with an LLM—as simple as copying and pasting a single block of text.

Purpose

The primary purpose of the aiar format is to package a project's files into a single text block for use with a Large Language Model. This allows an LLM to receive or transmit a collection of files within a text-only interface, bypassing the need for binary archive formats like .zip. (Yes, I once had an LLM try to send me a base 64 encoded zip file, I kid you not. And, no, it wasn’t a valid zip file.)

Key Features

  • Single File: The entire archive is one text file. Easy to copy, paste, and save.
  • LLM-Friendly: The format is simple for an LLM to generate or consume. Because the file content is never executed, the LLM doesn't need to worry about shell-escaping special characters.
  • Self-Contained: The extraction logic is bundled with the data. No external tools like zip or tar are needed to unpack it.

The aiar Format

An aiar script has two main parts, separated by an exit 0 command.

  1. The Unpacker Logic: A bash script that reads its own file, line by line. It looks for a unique separator line that denotes the start of a new file. This part is optional if you use the "aiar" tool and might even be desirable if you don't want to run code directly from an LLM.
  2. The exit 0 Guard: This command prevents the shell from ever trying to execute the data section below it.
  3. The Data Payload: The raw, unescaped contents of your files, each preceded by the unique separator line.

Supported Formats

The aiar tool supports multiple output formats:

  • bare (.aiar) - Data-only format without self-extraction script. Most LLM-friendly for creation.
  • bash (.sh) - Self-extracting Bash script (default). Compatible with Linux/macOS/WSL.
  • python (.py) - Self-extracting Python script. Requires Python 3.6+.
  • nodejs (.js) - Self-extracting Node.js script. Requires Node.js.
  • powershell (.ps1) - Self-extracting PowerShell script. Requires PowerShell 5.1+.

All formats support both text and binary files and can be extracted using the aiar extract command.

Example Archives

Below are examples of each format. Each demonstrates the same two-file archive structure.

Bare Format (.aiar)

The bare format is the simplest and most LLM-friendly. It contains only the data with a separator definition, no extraction logic. This format can be extracted using aiar extract archive.aiar.

Note for LLMs: To create a bare format archive:

  1. Start with SEPARATOR="..." (use a unique string not found in file contents)
  2. Mark each file with :t: for text or :b: for binary
  3. Use format: SEPARATOR:t:filepath followed by file content
  4. The contents of the binary blobs are base64 encoded.
SEPARATOR="++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:"

++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/README.md
# Example Project

This is a sample file in the bare aiar format.

++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/hello.txt
Hello from aiar!
This file can contain any text content.
Special characters like $, #, ', ", `, $(cmd) are preserved literally.

Bash Format (.sh)

The Bash format is a self-extracting shell script. Save as archive.sh and run with bash archive.sh.

#!/bin/bash
# aiar: AI Archive - Self-extracting script generated by aiar-py. #
#
# Note to LLMS, this archive contains multiple files with separator lines. #
# Text files are contained between separators verbatim, binary files are base64-encoded. #
# Every line in the extractor shell script ends with a '#' character to guard against a #
# possible CRLF line ending that would otherwise cause script failure in cases where CRLF #
# line endings are not supported. Choose a random SEPARATOR string to avoid conflicts when #
# archiving archives. #
#
SEPARATOR="++++++++++--------:8c7163c6-4902-46b0-9629-f75517de083c:" #
writing=false #
#
# Function to report errors and exit cleanly #
handle_error() { #
  echo "Error: $1" >&2 #
  exit 1 #
} #
#
# Function to close the previous file descriptor and wait for bg processes #
close_previous_fd() { #
    if [ "$writing" = true ]; then #
      exec 3>&- #
        # Wait for any background process (like base64) to finish #
        wait 2>/dev/null || true #
    fi #
      writing=false #
} #
#
while read -r line; do #
  if [[ "$line" == "$SEPARATOR"* ]]; then #
    close_previous_fd #
#
    payload="${line#$SEPARATOR}" #
    IFS=':' read -r type filepath <<< "$payload" #
    # Strip any trailing carriage returns (DOS line endings) #
    filepath="${filepath%$'\r'}" #
#
    if [ -n "$filepath" -a ! -e "$filepath" ]; then #
      echo "Creating: $filepath" #
      mkdir -p "$(dirname "$filepath")" || handle_error "Cannot create directory for '$filepath'." #
#
      if [ "$type" == "t" ]; then #
        exec 3>"$filepath" || handle_error "Cannot open '$filepath' for writing." #
        writing=true #
      else #
        handle_error "Invalid file type '$type' in separator." #
      fi #
    else #
        echo "Skipping already existing file: '$filepath'" #
    fi #
  elif [ "$writing" = true ]; then #
    echo "$line" >&3 #
  fi #
done < "$0" #
#
close_previous_fd # Close the very last file #
#
echo "Extraction complete." #
exit 0 #
#
# --- DATA --- #
#
++++++++++--------:8c7163c6-4902-46b0-9629-f75517de083c:t:example/hello.txt
Hello from aiar!
She said, "He's going to the store for $5."
++++++++++--------:8c7163c6-4902-46b0-9629-f75517de083c:t:example/README.md
# Example Project

This file includes special characters: $PATH, #comment, 'quotes', "quotes", `backticks`, $(cmd)
All are preserved literally.

Python Format (.py)

The Python format is a self-extracting Python script. Save as archive.py and run with python archive.py. Files are embedded as commented lines with # prefix.

import sys, os, re, base64
from pathlib import Path

SEPARATOR="++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:"
SEP = re.escape(SEPARATOR)

def _safe_dest(rel: str) -> Path:
    p = Path(rel)
    if p.is_absolute():
        raise ValueError(f"Absolute path not allowed: {rel}")
    dest = (Path(".") / p).resolve()
    if Path(".").resolve() not in (set(dest.parents) | {dest}):
        raise ValueError(f"Path escapes output root: {rel}")
    return dest

def extract_all():
    with open(__file__, "r", encoding="utf-8") as f:
        script_content = f.read()

    pat = re.compile(
        rf"^# ?{SEP}([tb]):([^\n]+)\n(.*?)(?=^# ?{SEP}[tb]:|\Z)",
        re.DOTALL | re.MULTILINE,)

    any_found = False
    for ftype, path, body in pat.findall(script_content):
        any_found = True
        path = path.strip()
        try:
            dest = _safe_dest(path)
        except ValueError as e:
            print(f"Warning: {e}. Skipping.")
            continue

        if dest.exists():
            print(f"Skipping already existing file: '{dest}'")
            continue

        print(f"Creating: {dest}")
        dest.parent.mkdir(parents=True, exist_ok=True)
        
        uncommented_body = re.sub(r"^# ?", "", body, flags=re.MULTILINE)
        
        if ftype == "t":
            with open(dest, "w", encoding="utf-8", newline="\n") as out:
                out.write(uncommented_body)
        else:  # binary
            with open(dest, "wb") as out:
                out.write(base64.b64decode(uncommented_body.strip().encode("ascii"), validate=False))

    if not any_found:
        print("Error: No payload sections found in data block.")
        sys.exit(1)

extract_all()
print("Extraction complete.")
sys.exit(0)

# ++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/README.md
# # Example Project
# 
# This file includes special characters: $PATH, #comment, 'quotes', "quotes", `backticks`, $(cmd)
# All are preserved literally.
# 
# ++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/hello.txt
# Hello from aiar!
# She said, "He's going to the store for $5."

Node.js Format (.js)

The Node.js format is a self-extracting Node.js script. Save as archive.js and run with node archive.js. Files are embedded as commented lines with // prefix.

#!/usr/bin/env node

const fs = require('fs');
const path = require('path');

function escapeRegex(str) {
    return str.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
}

const SEPARATOR = "++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:";
const SEP = escapeRegex(SEPARATOR);

function safeDest(rel) {
    if (path.isAbsolute(rel)) {
        throw new Error(`Absolute path not allowed: ${rel}`);
    }
    const dest = path.resolve(process.cwd(), rel);
    if (!dest.startsWith(process.cwd())) {
        throw new Error(`Path escapes output root: ${rel}`);
    }
    return dest;
}

function extractAll() {
    const scriptContent = fs.readFileSync(__filename, 'utf8');

    const pat = new RegExp(
        `^// ?${SEP}([tb]):([^\\n]+)\\n(.*?)(?=(^// ?${SEP}[tb]:|\\Z))`,
        'gms'
    );

    const matches = [...scriptContent.matchAll(pat)];

    if (matches.length === 0) {
        console.error("Error: No payload sections found in data block.");
        process.exit(1);
    }
    
    for (const match of matches) {
        const [, ftype, relPath, body] = match;
        const cleanPath = relPath.trim();
        
        let dest;
        try {
            dest = safeDest(cleanPath);
        } catch (e) {
            console.warn(`Warning: ${e.message}. Skipping.`);
            continue;
        }

        if (fs.existsSync(dest)) {
            console.log(`Skipping already existing file: '${dest}'`);
            continue;
        }

        console.log(`Creating: ${dest}`);
        fs.mkdirSync(path.dirname(dest), { recursive: true });

        const uncommentedBody = body.replace(/^\/\/ ?/gm, '');

        if (ftype === 't') {
            fs.writeFileSync(dest, uncommentedBody, { encoding: 'utf8' });
        } else {
            const buffer = Buffer.from(uncommentedBody.trim(), 'base64');
            fs.writeFileSync(dest, buffer);
        }
    }
}

extractAll();
console.log("Extraction complete.");
process.exit(0);

// ++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/README.md
// # Example Project
// 
// This file includes special characters: $PATH, #comment, 'quotes', "quotes", `backticks`, $(cmd)
// All are preserved literally.
// 
// ++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/hello.txt
// Hello from aiar!
// She said, "He's going to the store for $5."

PowerShell Format (.ps1)

The PowerShell format is a self-extracting PowerShell script. Save as archive.ps1 and run with powershell -ExecutionPolicy Bypass -File archive.ps1. Files are embedded as commented lines with # prefix.

#Requires -Version 5.1

$SEPARATOR="++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:"

function Escape-Regex {
    param([string]$String)
    return [System.Text.RegularExpressions.Regex]::Escape($String)
}

function Safe-Dest {
    param([string]$RelativePath)
    if ([System.IO.Path]::IsPathRooted($RelativePath)) {
        throw "Absolute path not allowed: $RelativePath"
    }
    $resolvedPath = [System.IO.Path]::GetFullPath((Join-Path -Path $PWD.Path -ChildPath $RelativePath))
    if (-not $resolvedPath.StartsWith($PWD.Path)) {
        throw "Path escapes output root: $RelativePath"
    }
    return $resolvedPath
}

function Extract-All {
    $scriptPath = $PSCommandPath
    $scriptContent = Get-Content -Path $scriptPath -Raw
    $sep = Escape-Regex "$SEPARATOR"
    $pattern = "(?ms)^#\s?$sep([tb]):([^\n]+)\n(.*?)(?=(^#\s?$sep[tb]:|\Z))"
    $matches = [System.Text.RegularExpressions.Regex]::Matches($scriptContent, $pattern)

    if ($matches.Count -eq 0) {
        Write-Error "No payload sections found in data block."
        exit 1
    }

    foreach ($match in $matches) {
        $ftype = $match.Groups[1].Value
        $relPath = $match.Groups[2].Value.Trim()
        $body = $match.Groups[3].Value

        try {
            $dest = Safe-Dest -RelativePath $relPath
        } catch {
            Write-Warning "Warning: $_. Skipping."
            continue
        }

        if (Test-Path -LiteralPath $dest) {
            Write-Output "Skipping already existing file: '$dest'"
            continue
        }

        Write-Output "Creating: $dest"
        $null = New-Item -ItemType Directory -Force -Path (Split-Path -Path $dest -Parent)
        $uncommentedBody = $body -replace '(?m)^#\s?' , ''

        if ($ftype -eq 't') {
            Set-Content -Path $dest -Value $uncommentedBody -NoNewline -Encoding utf8
        } elseif ($ftype -eq 'b') {
            $cleanBase64String = $uncommentedBody -replace '\s'
            $bytes = [System.Convert]::FromBase64String($cleanBase64String)
            [System.IO.File]::WriteAllBytes($dest, $bytes)
        } else {
            Write-Warning "Unknown file type '$ftype' for '$relPath'. Skipping."
        }
    }
}

Extract-All
Write-Output "Extraction complete."
exit 0

# --- PAYLOAD ---
# ++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/README.md
# # Example Project
# 
# This file includes special characters: $PATH, #comment, 'quotes', "quotes", `backticks`, $(cmd)
# All are preserved literally.
# 
# ++++++++++--------:a1b2c3d4-5678-90ab-cdef-1234567890ab:t:example/hello.txt
# Hello from aiar!
# She said, "He's going to the store for $5."

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aiar-0.1.9.tar.gz (22.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aiar-0.1.9-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file aiar-0.1.9.tar.gz.

File metadata

  • Download URL: aiar-0.1.9.tar.gz
  • Upload date:
  • Size: 22.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for aiar-0.1.9.tar.gz
Algorithm Hash digest
SHA256 b5bc1700d99f98ddcac69b79da8fb3d113719aea836b4149ded0ae022b0c2e0b
MD5 5641962eb2944327719cea507383471f
BLAKE2b-256 b09e06fd47b79a8b81a99aca5799bae9d4d4e5443c4d2e773dca1bc62626b90d

See more details on using hashes here.

Provenance

The following attestation bundles were made for aiar-0.1.9.tar.gz:

Publisher: publish.yml on owebeeone/aiar

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aiar-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: aiar-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 18.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for aiar-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 6a50efe551277d66ab02a5f814d8ab70f20f1808b97068408613ef0753de2a39
MD5 9fa1d7f7c9c59f2208504e25c1676456
BLAKE2b-256 7fc132d00e779550c8f500b392dd5b5056f27144adbde6ff0171906e7f2e71cc

See more details on using hashes here.

Provenance

The following attestation bundles were made for aiar-0.1.9-py3-none-any.whl:

Publisher: publish.yml on owebeeone/aiar

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page