Skip to main content

BBC Micro DFS disc image tool - detokenizer and pretty-printer for BBC BASIC

Project description

beebtools

Python 3.8+ License: MIT Tests

A Python tool for working with BBC Micro DFS disc images.

beebtools can read disk catalogues, extract and detokenize BBC BASIC programs to a more human-readable (and text editor friendly) format, including a pretty-printer that makes dense BBC Basic code more legible.

Disc images and the DFS catalogue

BBC Micro software is widely preserved as .ssd (single-sided) and .dsd (double-sided interleaved) disc images. Each image is a raw sector-by-sector dump of the original floppy disc, laid out according to Acorn's Disc Filing System (DFS).

The first two sectors of each disc side hold the catalogue: disc title, file count, and one entry per file giving its name, DFS directory prefix, load and exec addresses, byte length, and start sector. beebtools reads this catalogue and can list it in a human-readable table, sorted by name, catalogue order, or file size.

Files are extracted by DFS name (T.MYPROG, $.!BOOT) or by bare name when unambiguous. On a double-sided .dsd image both sides are catalogued; if the same bare name appears on both sides, beebtools tells you and asks you to be specific. Bulk extraction (-a) pulls every file off the disc at once.

Programs: BBC BASIC and binary files

Most files you will want to look at on a BBC Micro disc are BBC BASIC programs. They are not stored as text. The BBC Micro's BASIC ROM tokenizes programs before saving them: keywords like PRINT, GOTO, and FOR are replaced with single bytes in the range 0x80-0xFF, GOTO and GOSUB targets are encoded as compact 3-byte line-number references, and the whole thing is written as a sequence of binary line records with no human-readable structure.

Binary files (machine code, data, sound samples) are stored as raw bytes and extracted as-is.

For BASIC files, beebtools does three things in sequence:

  1. Detokenize - decode the binary line records back to LIST-style text, expanding keyword tokens, decoding line-number references, and handling REM and DATA tails correctly (they are literal ASCII and must not be expanded).

  2. Pretty-print (optional, --pretty) - add operator spacing to the raw detokenized text. BBC BASIC stores only the spaces the programmer explicitly typed, so code like IFx>100THENx=0:y=0 is normal. The pretty-printer adds spaces around operators and punctuation while leaving string literals, REM tails, and DATA tails completely untouched.

  3. Anti-listing trap detection - some 1980s software used *| followed by VDU 21 (disable output) bytes as a copy-protection trick. Typing LIST on the real machine would blank the screen after that line. beebtools converts *| statements to REM *| and strips the control characters, so the program is readable.

Features

  • Read DFS catalogues from .ssd and .dsd disc images

  • Extract individual files by DFS name (T.MYPROG, or bare MYPROG)

  • Bulk-extract everything from a disc image at once

  • Detokenize BBC BASIC II programs to LIST-style plain text

  • Pretty-printer: add operator spacing to make terse BASIC readable

  • Anti-listing trap detection: neutralise copy-protection *| traps

  • Star command awareness: *SCUMPI is passed through verbatim, no false spacing

  • Zero dependencies - pure Python 3.8+, single package

Installation

pip install beebtools

For development (installs pytest and uses an editable install):

git clone https://github.com/acscpt/beebtools
cd beebtools
pip install -e ".[dev]"

Quick start

# List what is on a disc image
beebtools cat mydisc.dsd

# Extract and detokenize a BASIC program
beebtools extract mydisc.dsd T.MYPROG

# Extract with operator spacing added
beebtools extract mydisc.dsd T.MYPROG --pretty

# Extract everything from a double-sided disc
beebtools extract mydisc.dsd -a --pretty -d output/

Pretty-printer: what it does

Raw BBC BASIC from a tokenized file looks like this when detokenized:

  100 IFx>100ORy<0THENx=0:y=0
  110 FORi=1TO8:s=s+x*x:NEXTi
  120 SOUND1,-15,s,5:IFs>9999THENs=0

With --pretty:

  100 IFx > 100ORy < 0THENx = 0 : y = 0
  110 FORi = 1TO8 : s = s + x * x : NEXTi
  120 SOUND1, -15, s, 5 : IFs > 9999THENs = 0

Specifically, the pretty-printer adds:

  • a space between the line number and the first token

  • spaces around comparison operators: = < > <> <= >=

  • spaces around arithmetic operators: + - * /

  • padding around colon statement separators: a:b becomes a : b

  • a trailing space after each comma

  • correct unary minus context: (-x) and SOUND 1,-15,s,5 stay unary

  • string literals, REM tails, and DATA tails are never touched

  • star commands (*COMMAND) are passed through verbatim

Note that spaces between keywords and identifiers are not added - BBC BASIC stores only the spaces that were explicitly typed. The pretty-printer works on operators and punctuation, which is where the density tends to be worst.

Anti-listing traps

A common anti snoop-protection trick was to follow a *| MOS comment with CHR$(21) (VDU 21, disable output) bytes. When you typed LIST, the screen would go blank after that line. The program was still there - you just couldn't see it.

beebtools detects *| at the start of a statement and converts it to REM *|, stripping any control characters from the tail. The comment text (if any) is preserved.

  590 *|                       <- in the tokenized file
  590 REM *|                   <- what beebtools shows you

Usage

Command Line

beebtools, once installed in a pyhon enabled environment, can be used from the command line.

cat

List a disc catalogue.

beebtools cat <image> [--sort name|catalog|size]

Lists all files on all sides of the disc with load address, exec address, length, and file type.

--- Side 0: BBC_MUSIC_2 (28 files) ---

  Name          Load     Exec   Length  Type
   $.!BOOT  00000000 00000000 00000018
   T.BACHPR 00000E00 00008023 000011A4  BASIC
   T.BEETHO 00000E00 00008023 00000F6C  BASIC
   ...

Sort options:

  • name (default) - alphabetical by filename

  • catalog - original on-disc DFS order

  • size - ascending by file length

extract

Extract a file from a disc image.

beebtools extract <image> <filename> [-o FILE] [--pretty]

BASIC programs are automatically detected and detokenized to plain text. All other files are extracted as raw bytes.

# Print to stdout
beebtools extract mydisc.dsd T.MYPROG --pretty

# Write to a file
beebtools extract mydisc.dsd T.MYPROG -o myprog.bas --pretty

# Bare filename - works when the name is unique across all sides
beebtools extract mydisc.dsd MYPROG

For binary files written with -o, the load address, exec address, and length are printed so you have the information needed for a disassembler:

Extracted to loader.bin
$.LOADER  load=0x001900  exec=0x001900  length=512 bytes

When -o is omitted, raw bytes go directly to stdout for piping.

Bulk extract

Extract all files from a disc image by specifying the option -a.

beebtools extract <image> -a [-d DIR] [--pretty]

Extracts every file from the disc.

  • BASIC programs are saved as .bas text files

  • binaries are saves as .bin raw files.

The output directory defaults to the disc image filename stem (bbc_d1/ for bbc_d1.dsd).

On a double-sided .dsd image, files from each side are prefixed with side0_ or side1_ to prevent collisions between identically-named files.

Filename matching

extract accepts DFS filenames in two forms:

  • Explicit: T.MYPROG, $.MENU, $.!BOOT
  • Bare: MYPROG - works when the name is unique on the disc

Ambiguous bare names report all matches:

Ambiguous filename 'LOADER' - specify with directory prefix.
  Side 0: $.LOADER
  Side 1: T.LOADER

Using as a library

beebtools can also be used as a Python library. The public API is imported directly from the beebtools package:

from beebtools import openDiscImage, detokenize, prettyPrint, isBasic, looksLikeText

sides = openDiscImage("mydisc.dsd")
for disc in sides:
    title, entries = disc.readCatalogue()
    print(f"Disc: {title}")
    for entry in entries:
        if isBasic(entry):
            data = disc.readFile(entry)
            if looksLikeText(data):
                lines = prettyPrint(detokenize(data))
                print("\n".join(lines))

Supported formats

Format Description
.ssd Single-sided 40 or 80 track
.dsd Double-sided interleaved

Both 40-track and 80-track images are supported. The tool does not currently support Watford DFS extended catalogues (62-file discs).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

beebtools-0.1.1.tar.gz (24.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

beebtools-0.1.1-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file beebtools-0.1.1.tar.gz.

File metadata

  • Download URL: beebtools-0.1.1.tar.gz
  • Upload date:
  • Size: 24.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for beebtools-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1deed9148cbd8224c72d2bca2d1f7f4d5c565fbeb4b17fe4207ba4e8bdfb1627
MD5 02700214231a2e5043503ec9e6ea3a9c
BLAKE2b-256 81e90cf015cd26ac976165c2585916492a295f80c49ca38f05fd64f83be18a20

See more details on using hashes here.

Provenance

The following attestation bundles were made for beebtools-0.1.1.tar.gz:

Publisher: publish.yml on acscpt/beebtools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file beebtools-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: beebtools-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 17.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for beebtools-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1d73939170656a9bb2174c1e6bbd3828da14b46be4335d18fe82971036015e1c
MD5 428306d99cec2dc2053ef4039a2c9c09
BLAKE2b-256 93cce60f9541d7f9b2d52ebfe8baa81cbd4e61933c4403a4eadd2f3c5a2ee839

See more details on using hashes here.

Provenance

The following attestation bundles were made for beebtools-0.1.1-py3-none-any.whl:

Publisher: publish.yml on acscpt/beebtools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page