Skip to main content

BBC Micro DFS disc image tool - detokenizer and pretty-printer for BBC BASIC

Project description

beebtools

PyPI Python 3.8+ License: MIT Tests

A Python tool for working with BBC Micro DFS disc images.

beebtools can read disc catalogues, extract and detokenize BBC BASIC programs to a more human-readable (and text editor friendly) format, including a pretty-printer that makes dense BBC BASIC code more legible.

Disc images and the DFS catalogue

BBC Micro software is widely preserved as .ssd (single-sided) and .dsd (double-sided interleaved) disc images. Each image is a raw sector-by-sector dump of the original floppy disc, laid out according to Acorn's Disc Filing System (DFS).

The first two sectors of each disc side hold the catalogue: disc title, file count, and one entry per file giving its name, DFS directory prefix, load and exec addresses, byte length, and start sector. beebtools reads this catalogue and can list it in a human-readable table, sorted by name, catalogue order, or file size.

Files are extracted by DFS name (T.MYPROG, $.!BOOT) or by bare name when unambiguous. On a double-sided .dsd image both sides are catalogued; if the same bare name appears on both sides, beebtools tells you and asks you to be specific. Bulk extraction (-a) pulls every file off the disc at once.

Programs: BBC BASIC and binary files

Most files you will want to look at on a BBC Micro disc are BBC BASIC programs. They are not stored as text. The BBC Micro's BASIC ROM tokenizes programs before saving them: keywords like PRINT, GOTO, and FOR are replaced with single bytes in the range 0x80-0xFF, GOTO and GOSUB targets are encoded as compact 3-byte line-number references, and the whole thing is written as a sequence of binary line records with no human-readable structure.

Binary files (machine code, data, sound samples) are stored as raw bytes and extracted as-is.

For BASIC files, beebtools does three things in sequence:

  1. Detokenize - decode the binary line records back to LIST-style text, expanding keyword tokens, decoding line-number references, and handling REM and DATA tails correctly (they are literal ASCII and must not be expanded).

  2. Pretty-print (optional, --pretty) - add operator spacing to the raw detokenized text. BBC BASIC stores only the spaces the programmer explicitly typed, so code like IFx>100THENx=0:y=0 is normal. The pretty-printer adds spaces around operators and punctuation while leaving string literals, REM tails, and DATA tails completely untouched.

  3. Anti-listing trap detection - some 1980s software used *| followed by VDU 21 (disable output) bytes as a copy-protection trick. Typing LIST on the real machine would blank the screen after that line. beebtools converts *| statements to REM *| and strips the control characters, so the program is readable.

Features

  • Read DFS catalogues from .ssd and .dsd disc images

  • Extract individual files by DFS name (T.MYPROG, or bare MYPROG)

  • Bulk-extract everything from a disc image at once

  • Detokenize BBC BASIC II programs to LIST-style plain text

  • Pretty-printer: add operator spacing to make terse BASIC readable

  • Anti-listing trap detection: neutralise copy-protection *| traps

  • Star command awareness: *SCUMPI is passed through verbatim, no false spacing

  • Zero dependencies - pure Python 3.8+, single package

Installation

pip install beebtools

For development (installs pytest and uses an editable install):

git clone https://github.com/acscpt/beebtools
cd beebtools
python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -e ".[dev]"

Quick start

# List what is on a disc image
beebtools cat mydisc.dsd

# Extract and detokenize a BASIC program
beebtools extract mydisc.dsd T.MYPROG

# Extract with operator spacing added
beebtools extract mydisc.dsd T.MYPROG --pretty

# Extract everything from a double-sided disc
beebtools extract mydisc.dsd -a --pretty -d output/

Pretty-printer: what it does

Raw BBC BASIC from a tokenized file looks like this when detokenized:

  100 IFx>100ORy<0THENx=0:y=0
  110 FORi=1TO8:s=s+x*x:NEXTi
  120 SOUND1,-15,s,5:IFs>9999THENs=0

With --pretty:

  100 IFx > 100ORy < 0THENx = 0 : y = 0
  110 FORi = 1TO8 : s = s + x * x : NEXTi
  120 SOUND1, -15, s, 5 : IFs > 9999THENs = 0

Specifically, the pretty-printer adds:

  • a space between the line number and the first token

  • spaces around comparison operators: = < > <> <= >=

  • spaces around arithmetic operators: + - * /

  • padding around colon statement separators: a:b becomes a : b

  • a trailing space after each comma

  • correct unary minus context: (-x) and SOUND 1,-15,s,5 stay unary

  • string literals, REM tails, and DATA tails are never touched

  • star commands (*COMMAND) are passed through verbatim

Note that spaces between keywords and identifiers are not added - BBC BASIC stores only the spaces that were explicitly typed. The pretty-printer works on operators and punctuation, which is where the density tends to be worst.

Anti-listing traps

A common copy-protection trick was to follow a *| MOS comment with CHR$(21) (VDU 21, disable output) bytes. When you typed LIST, the screen would go blank after that line. The program was still there - you just couldn't see it.

beebtools detects *| at the start of a statement and converts it to REM *|, stripping any control characters from the tail. The comment text (if any) is preserved.

  590 *|                       <- in the tokenized file
  590 REM *|                   <- what beebtools shows you

Usage

Command Line

beebtools, once installed in a Python enabled environment, can be used from the command line.

cat

List a disc catalogue.

beebtools cat <image> [--sort name|catalog|size] [--inspect]

Lists all files on all sides of the disc with load address, exec address, length, and file type. BASIC is identified from the exec address without reading file data.

Add --inspect (-i) to also read each file's bytes and label plain ASCII text files as TEXT in the type column:

--- Side 0: BBC_MUSIC_2 (28 files) ---

  Name          Load     Exec   Length  Type
   $.!BOOT  00000000 00000000 00000018  TEXT
   T.BACHPR 00000E00 00008023 000011A4  BASIC
   T.BEETHO 00000E00 00008023 00000F6C  BASIC
   ...

Sort options:

  • name (default) - alphabetical by filename

  • catalog - original on-disc DFS order

  • size - ascending by file length

search

Search all BASIC files on a disc for lines containing a text pattern.

beebtools search <image> <pattern> [filename] [-i] [--pretty]

Detokenizes every BASIC file on the disc and scans each line for the pattern. Matching lines are printed as:

--- Side 0: T.MYPROG ---
   10 GOTO 100
  230 IF SCORE > 100 THEN GOTO 230

Options:

  • filename - limit the search to one file (e.g. T.MYPROG or bare MYPROG)

  • -i / --ignore-case - case-insensitive match

  • --pretty - apply operator spacing before matching

extract

Extract a file from a disc image.

beebtools extract <image> <filename> [-o FILE] [--pretty]

BASIC programs are automatically detected and detokenized to plain text. All other files are extracted as raw bytes.

# Print to stdout
beebtools extract mydisc.dsd T.MYPROG --pretty

# Write to a file
beebtools extract mydisc.dsd T.MYPROG -o myprog.bas --pretty

# Bare filename - works when the name is unique across all sides
beebtools extract mydisc.dsd MYPROG

For binary files written with -o, the load address, exec address, and length are printed so you have the information needed for a disassembler:

Extracted to loader.bin
$.LOADER  load=0x001900  exec=0x001900  length=512 bytes

When -o is omitted, raw bytes go directly to stdout for piping.

Bulk extract

Extract all files from a disc image by specifying the option -a.

beebtools extract <image> -a [-d DIR] [--pretty] [-s subdir|prefix]

Extracts every file from the disc.

  • BASIC programs are saved as .bas text files

  • plain ASCII text files are saved as .txt (BBC CR line endings are normalised to LF)

  • everything else is saved as .bin raw files

The output directory defaults to the disc image filename stem (bbc_d1/ for bbc_d1.dsd).

On a double-sided .dsd image, files from each side are always kept separate. The -s/--sides flag controls the layout:

  • subdir (default) - files are written into side0/ and side1/ subdirectories under the output directory:

    bbc_d1/
      side0/
        $_BOOT.bin
        T_PROG.bas
      side1/
        $_BOOT.bin
        T_GAME.bas
    
  • prefix - all files are written into the flat output directory, prefixed with side0_ or side1_:

    bbc_d1/
      side0_$_BOOT.bin
      side0_T_PROG.bas
      side1_$_BOOT.bin
      side1_T_GAME.bas
    
Filename matching

extract accepts DFS filenames in two forms:

  • Explicit: T.MYPROG, $.MENU, $.!BOOT
  • Bare: MYPROG - works when the name is unique on the disc

Ambiguous bare names report all matches:

Ambiguous filename 'LOADER' - specify with directory prefix.
  Side 0: $.LOADER
  Side 1: T.LOADER

Using as a library

beebtools can also be used as a Python library. The public API is imported directly from the beebtools package:

from beebtools import openDiscImage, detokenize, prettyPrint, isBasic, looksLikeText

sides = openDiscImage("mydisc.dsd")
for disc in sides:
    title, entries = disc.readCatalogue()
    print(f"Disc: {title}")
    for entry in entries:
        if isBasic(entry):
            data = disc.readFile(entry)
            if looksLikeText(data):
                lines = prettyPrint(detokenize(data))
                print("\n".join(lines))

Supported formats

Format Description
.ssd Single-sided 40 or 80 track
.dsd Double-sided interleaved

Both 40-track and 80-track images are supported. The tool does not currently support Watford DFS extended catalogues (62-file discs).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

beebtools-0.2.0.tar.gz (33.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

beebtools-0.2.0-py3-none-any.whl (21.9 kB view details)

Uploaded Python 3

File details

Details for the file beebtools-0.2.0.tar.gz.

File metadata

  • Download URL: beebtools-0.2.0.tar.gz
  • Upload date:
  • Size: 33.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for beebtools-0.2.0.tar.gz
Algorithm Hash digest
SHA256 7249c9af21f49b8db7852709e02eba70e3793dfedb405ae18ec3bdfb4ef09dde
MD5 f6b053822a4acc649ad2c42bf9080e96
BLAKE2b-256 906261636076d5b9c0244a2a94c2430d37c28523cdc9183ecc4eae06013262cc

See more details on using hashes here.

Provenance

The following attestation bundles were made for beebtools-0.2.0.tar.gz:

Publisher: publish.yml on acscpt/beebtools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file beebtools-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: beebtools-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 21.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for beebtools-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b6f9377e7e9d0ff75770874728250601514f7b4706ed031021cd141a793d3060
MD5 2befeb7b15cffa39eaeadf0a7dd5d17e
BLAKE2b-256 38ee7ae553181b8749a82ee3dc79c23f0e2af539b81cbb5e5faf04aad3c5ca8c

See more details on using hashes here.

Provenance

The following attestation bundles were made for beebtools-0.2.0-py3-none-any.whl:

Publisher: publish.yml on acscpt/beebtools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page