BBC Micro DFS disc image tool - detokenizer and pretty-printer for BBC BASIC
Project description
beebtools
A Python tool for working with BBC Micro DFS disc images.
beebtools can read disc catalogues, extract and detokenize BBC BASIC programs to
a more human-readable (and text editor friendly) format, including a pretty-printer
that makes dense BBC BASIC code more legible.
Disc images and the DFS catalogue
BBC Micro software is widely preserved as .ssd (single-sided) and .dsd
(double-sided interleaved) disc images. Each image is a raw sector-by-sector
dump of the original floppy disc, laid out according to Acorn's Disc Filing
System (DFS).
The first two sectors of each disc side hold the catalogue: disc title, file
count, and one entry per file giving its name, DFS directory prefix, load and
exec addresses, byte length, and start sector. beebtools reads this catalogue
and can list it in a human-readable table, sorted by name, catalogue order, or
file size.
Files are extracted by DFS name (T.MYPROG, $.!BOOT) or by bare name when
unambiguous. On a double-sided .dsd image both sides are catalogued; if the
same bare name appears on both sides, beebtools tells you and asks you to be
specific. Bulk extraction (-a) pulls every file off the disc at once.
Programs: BBC BASIC and binary files
Most files you will want to look at on a BBC Micro disc are BBC BASIC programs.
They are not stored as text. The BBC Micro's BASIC ROM tokenizes programs before
saving them: keywords like PRINT, GOTO, and FOR are replaced with single
bytes in the range 0x80-0xFF, GOTO and GOSUB targets are encoded as compact
3-byte line-number references, and the whole thing is written as a sequence of
binary line records with no human-readable structure.
Binary files (machine code, data, sound samples) are stored as raw bytes and extracted as-is.
For BASIC files, beebtools does three things in sequence:
-
Detokenize - decode the binary line records back to
LIST-style text, expanding keyword tokens, decoding line-number references, and handlingREMandDATAtails correctly (they are literal ASCII and must not be expanded). -
Pretty-print (optional,
--pretty) - add operator spacing to the raw detokenized text. BBC BASIC stores only the spaces the programmer explicitly typed, so code likeIFx>100THENx=0:y=0is normal. The pretty-printer adds spaces around operators and punctuation while leaving string literals,REMtails, andDATAtails completely untouched. -
Anti-listing trap detection - some 1980s software used
*|followed byVDU 21(disable output) bytes as a copy-protection trick. TypingLISTon the real machine would blank the screen after that line.beebtoolsconverts*|statements toREM *|and strips the control characters, so the program is readable.
Features
-
Read DFS catalogues from
.ssdand.dsddisc images -
Extract individual files by DFS name (
T.MYPROG, or bareMYPROG) -
Bulk-extract everything from a disc image at once
-
Detokenize BBC BASIC II programs to
LIST-style plain text -
Pretty-printer: add operator spacing to make terse BASIC readable
-
Anti-listing trap detection: neutralise copy-protection
*|traps -
Star command awareness:
*SCUMPIis passed through verbatim, no false spacing -
Zero dependencies - pure Python 3.8+, single package
Installation
pip install beebtools
For development (installs pytest and uses an editable install):
git clone https://github.com/acscpt/beebtools
cd beebtools
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e ".[dev]"
Quick start
# List what is on a disc image
beebtools cat mydisc.dsd
# Extract and detokenize a BASIC program
beebtools extract mydisc.dsd T.MYPROG
# Extract with operator spacing added
beebtools extract mydisc.dsd T.MYPROG --pretty
# Extract everything from a double-sided disc
beebtools extract mydisc.dsd -a --pretty -d output/
Pretty-printer: what it does
Raw BBC BASIC from a tokenized file looks like this when detokenized:
100 IFx>100ORy<0THENx=0:y=0
110 FORi=1TO8:s=s+x*x:NEXTi
120 SOUND1,-15,s,5:IFs>9999THENs=0
With --pretty:
100 IFx > 100ORy < 0THENx = 0 : y = 0
110 FORi = 1TO8 : s = s + x * x : NEXTi
120 SOUND1, -15, s, 5 : IFs > 9999THENs = 0
Specifically, the pretty-printer adds:
-
a space between the line number and the first token
-
spaces around comparison operators:
=<><><=>= -
spaces around arithmetic operators:
+-*/ -
padding around colon statement separators:
a:bbecomesa : b -
a trailing space after each comma
-
correct unary minus context:
(-x)andSOUND 1,-15,s,5stay unary -
string literals,
REMtails, andDATAtails are never touched -
star commands (
*COMMAND) are passed through verbatim
Note that spaces between keywords and identifiers are not added - BBC BASIC stores only the spaces that were explicitly typed. The pretty-printer works on operators and punctuation, which is where the density tends to be worst.
Anti-listing traps
A common copy-protection trick was to follow a *| MOS comment with
CHR$(21) (VDU 21, disable output) bytes. When you typed LIST, the screen
would go blank after that line. The program was still there - you just couldn't
see it.
beebtools detects *| at the start of a statement and converts it to REM *|,
stripping any control characters from the tail. The comment text (if any) is
preserved.
590 *| <- in the tokenized file
590 REM *| <- what beebtools shows you
Usage
Command Line
beebtools, once installed in a Python enabled environment, can be used from the
command line.
cat
List a disc catalogue.
beebtools cat <image> [--sort name|catalog|size] [--inspect]
Lists all files on all sides of the disc with load address, exec address, length, and file type. BASIC is identified from the exec address without reading file data.
Add --inspect (-i) to also read each file's bytes and label plain ASCII
text files as TEXT in the type column:
--- Side 0: BBC_MUSIC_2 (28 files) ---
Name Load Exec Length Type
$.!BOOT 00000000 00000000 00000018 TEXT
T.BACHPR 00000E00 00008023 000011A4 BASIC
T.BEETHO 00000E00 00008023 00000F6C BASIC
...
Sort options:
-
name(default) - alphabetical by filename -
catalog- original on-disc DFS order -
size- ascending by file length
search
Search all BASIC files on a disc for lines containing a text pattern.
beebtools search <image> <pattern> [filename] [-i] [--pretty]
Detokenizes every BASIC file on the disc and scans each line for the pattern. Matching lines are printed as:
--- Side 0: T.MYPROG ---
10 GOTO 100
230 IF SCORE > 100 THEN GOTO 230
Options:
-
filename- limit the search to one file (e.g.T.MYPROGor bareMYPROG) -
-i/--ignore-case- case-insensitive match -
--pretty- apply operator spacing before matching
extract
Extract a file from a disc image.
beebtools extract <image> <filename> [-o FILE] [--pretty]
BASIC programs are automatically detected and detokenized to plain text. All other files are extracted as raw bytes.
# Print to stdout
beebtools extract mydisc.dsd T.MYPROG --pretty
# Write to a file
beebtools extract mydisc.dsd T.MYPROG -o myprog.bas --pretty
# Bare filename - works when the name is unique across all sides
beebtools extract mydisc.dsd MYPROG
For binary files written with -o, the load address, exec address, and
length are printed so you have the information needed for a disassembler:
Extracted to loader.bin
$.LOADER load=0x001900 exec=0x001900 length=512 bytes
When -o is omitted, raw bytes go directly to stdout for piping.
Bulk extract
Extract all files from a disc image by specifying the option -a.
beebtools extract <image> -a [-d DIR] [--pretty] [-s subdir|prefix]
Extracts every file from the disc.
-
BASIC programs are saved as
.bastext files -
plain ASCII text files are saved as
.txt(BBC CR line endings are normalised to LF) -
everything else is saved as
.binraw files
The output directory defaults to the disc image filename stem (bbc_d1/ for bbc_d1.dsd).
On a double-sided .dsd image, files from each side are always kept separate.
The -s/--sides flag controls the layout:
-
subdir(default) - files are written intoside0/andside1/subdirectories under the output directory:bbc_d1/ side0/ $_BOOT.bin T_PROG.bas side1/ $_BOOT.bin T_GAME.bas -
prefix- all files are written into the flat output directory, prefixed withside0_orside1_:bbc_d1/ side0_$_BOOT.bin side0_T_PROG.bas side1_$_BOOT.bin side1_T_GAME.bas
Filename matching
extract accepts DFS filenames in two forms:
- Explicit:
T.MYPROG,$.MENU,$.!BOOT - Bare:
MYPROG- works when the name is unique on the disc
Ambiguous bare names report all matches:
Ambiguous filename 'LOADER' - specify with directory prefix.
Side 0: $.LOADER
Side 1: T.LOADER
Using as a library
beebtools can also be used as a Python library. The public API is imported
directly from the beebtools package:
from beebtools import openDiscImage, detokenize, prettyPrint, isBasic, looksLikeText
sides = openDiscImage("mydisc.dsd")
for disc in sides:
title, entries = disc.readCatalogue()
print(f"Disc: {title}")
for entry in entries:
if isBasic(entry):
data = disc.readFile(entry)
if looksLikeText(data):
lines = prettyPrint(detokenize(data))
print("\n".join(lines))
Supported formats
| Format | Description |
|---|---|
.ssd |
Single-sided 40 or 80 track |
.dsd |
Double-sided interleaved |
Both 40-track and 80-track images are supported. The tool does not currently support Watford DFS extended catalogues (62-file discs).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file beebtools-0.2.0.tar.gz.
File metadata
- Download URL: beebtools-0.2.0.tar.gz
- Upload date:
- Size: 33.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7249c9af21f49b8db7852709e02eba70e3793dfedb405ae18ec3bdfb4ef09dde
|
|
| MD5 |
f6b053822a4acc649ad2c42bf9080e96
|
|
| BLAKE2b-256 |
906261636076d5b9c0244a2a94c2430d37c28523cdc9183ecc4eae06013262cc
|
Provenance
The following attestation bundles were made for beebtools-0.2.0.tar.gz:
Publisher:
publish.yml on acscpt/beebtools
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
beebtools-0.2.0.tar.gz -
Subject digest:
7249c9af21f49b8db7852709e02eba70e3793dfedb405ae18ec3bdfb4ef09dde - Sigstore transparency entry: 1201179512
- Sigstore integration time:
-
Permalink:
acscpt/beebtools@662a480e6467e0bbba7c8d71defba21db298b525 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/acscpt
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@662a480e6467e0bbba7c8d71defba21db298b525 -
Trigger Event:
push
-
Statement type:
File details
Details for the file beebtools-0.2.0-py3-none-any.whl.
File metadata
- Download URL: beebtools-0.2.0-py3-none-any.whl
- Upload date:
- Size: 21.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6f9377e7e9d0ff75770874728250601514f7b4706ed031021cd141a793d3060
|
|
| MD5 |
2befeb7b15cffa39eaeadf0a7dd5d17e
|
|
| BLAKE2b-256 |
38ee7ae553181b8749a82ee3dc79c23f0e2af539b81cbb5e5faf04aad3c5ca8c
|
Provenance
The following attestation bundles were made for beebtools-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on acscpt/beebtools
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
beebtools-0.2.0-py3-none-any.whl -
Subject digest:
b6f9377e7e9d0ff75770874728250601514f7b4706ed031021cd141a793d3060 - Sigstore transparency entry: 1201179534
- Sigstore integration time:
-
Permalink:
acscpt/beebtools@662a480e6467e0bbba7c8d71defba21db298b525 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/acscpt
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@662a480e6467e0bbba7c8d71defba21db298b525 -
Trigger Event:
push
-
Statement type: