Skip to main content

Creates a JSON file of the Library of Congress Classification system

Project description

Library of Congress Classification to JSON

lcc2json outputs a single JSON file of the Library of Congress Classification system.

For input, it downloads 699 .json files, 14 megabytes in total, from the Library of Congress.

Install

Install from PyPI:

pip install lcc2json

Or install from main source repo, such as:

git clone https://spacecruft.org/books/lcc2json
cd lcc2json/
python -m venv venv
source venv/bin/activate
pip install -U setuptools pip wheel
pip install -e .

Usage

Thusly.

Download the source JSON files from the Library of Congress

lcc2json-dl

Parse the downloaded JSON files and output a single JSON file:

lcc2json

Help

Download script help:

(venv) jebba@rs-pencil:~/devel/spacecruft/books/lcc2json$ lcc2json-dl --help
usage: lcc2json-dl [-h] [-o OUTPUT_DIR] [-d MAX_DEPTH] [-v] [--dry-run]

Download Library of Congress Classification JSON files from id.loc.gov

options:
  -h, --help            show this help message and exit
  -o, --output-dir OUTPUT_DIR
                        Output directory for JSON files (default: json)
  -d, --max-depth MAX_DEPTH
                        Maximum depth to crawl (default: 2)
  -v, --verbose         Enable verbose logging
  --dry-run             Show what would be downloaded without actually downloading

Examples:
  lcc2json-dl                     # Download all classifications to ./json/ (depth 2)
  lcc2json-dl --max-depth 4       # Download to depth 4 (includes subdivisions)
  lcc2json-dl -o lcc_data         # Download to ./lcc_data/
  lcc2json-dl -v                  # Verbose output
  lcc2json-dl --dry-run           # Show what would be downloaded

Depth levels:
  0 = Root classification scheme
  1 = Main classes (A-Z)
  2 = Subclass ranges (e.g., PR1-PR9680) [default]
  3 = Period/topic divisions (e.g., PR6050-PR6076)
  4 = Alphabetical ranges (e.g., PR6066.A-PR6066.Z)
  5+ = Individual entries (e.g., PR6066.A84)

Output JSON script help:

$ lcc2json --help
usage: lcc2json [-h] [-i INPUT_DIR] [-o OUTPUT] [-v] [--ranges]

Extract LCC outlines from downloaded JSON files.

options:
  -h, --help            show this help message and exit
  -i, --input-dir INPUT_DIR
                        Directory containing JSON files (default: json)
  -o, --output OUTPUT   Output file path (default: lcc.json)
  -v, --verbose         Enable verbose output
  --ranges              Include start/stop/prefix range fields in output (larger file size)

JSON Data

Depth 1

  • 21 files.
  • ~5 second download.
  • 215K size.
  • 21 classification entries.

Depth 2

  • 698 files.
  • 2 minute download.
  • 14M size.
  • 14,786 classification entries.
  • 516 unique prefixes.

Depth 3

  • 14,581 files.
  • 2 hour download.
  • 161M size.
  • 101,699 classification entries.

Depth 4

  • 100,551 files.
  • 14 hour download.
  • 824M size.
  • 344,073 classification entries.
  • Two missing (404) files.

Depth 5

  • 342,499 files.
  • 2 day download.
  • 2.9G size.
  • 766,892 classification entries.
  • Three missing files.

Downloads

JSON data snapshots are also available at this URL, so, optionally you don't have to download with this script:

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Copyright © 2025 Jeff Moe

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lcc2json-1.4.0.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lcc2json-1.4.0-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file lcc2json-1.4.0.tar.gz.

File metadata

  • Download URL: lcc2json-1.4.0.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for lcc2json-1.4.0.tar.gz
Algorithm Hash digest
SHA256 8731bd44730de97a0959e5441c6c8b4fd8d27a0c4827dba01daac3cea1460edd
MD5 df06c317bdd813f29ed9c2e8788c3d5b
BLAKE2b-256 bc81dd58c429a16c70252d282fc9577c490950539f0ef0d42e5a855b3eb4ff41

See more details on using hashes here.

File details

Details for the file lcc2json-1.4.0-py3-none-any.whl.

File metadata

  • Download URL: lcc2json-1.4.0-py3-none-any.whl
  • Upload date:
  • Size: 17.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for lcc2json-1.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6c2beb96cbfc67ea416196b86523c3cff8876cfd72541c0b6d25f7f3b7045966
MD5 e48dbc71fb711d1baf356d00ef1982d8
BLAKE2b-256 188668f457762213c83e90399919b176536ba1cf1f52b8f1a9c210a8e169e706

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page