Skip to main content

Creates a JSON file of the Library of Congress Classification system

Project description

Library of Congress Classification to JSON

lcc2json outputs a single JSON file of the Library of Congress Classification system.

For input, it downloads 699 .json files, 14 megabytes in total, from the Library of Congress.

Install

Install from PyPI:

pip install lcc2json

Or install from main source repo, such as:

git clone https://spacecruft.org/books/lcc2json
cd lcc2json/
python -m venv venv
source venv/bin/activate
pip install -U setuptools pip wheel
pip install -e .

Usage

Thusly.

Download the source JSON files from the Library of Congress

lcc2json-dl

Parse the downloaded JSON files and output a single JSON file:

lcc2json

Help

Download script help:

(venv) jebba@rs-pencil:~/devel/spacecruft/books/lcc2json$ lcc2json-dl --help
usage: lcc2json-dl [-h] [-o OUTPUT_DIR] [-d MAX_DEPTH] [-v] [--dry-run]

Download Library of Congress Classification JSON files from id.loc.gov

options:
  -h, --help            show this help message and exit
  -o, --output-dir OUTPUT_DIR
                        Output directory for JSON files (default: json)
  -d, --max-depth MAX_DEPTH
                        Maximum depth to crawl (default: 2)
  -v, --verbose         Enable verbose logging
  --dry-run             Show what would be downloaded without actually downloading

Examples:
  lcc2json-dl                     # Download all classifications to ./json/ (depth 2)
  lcc2json-dl --max-depth 4       # Download to depth 4 (includes subdivisions)
  lcc2json-dl -o lcc_data         # Download to ./lcc_data/
  lcc2json-dl -v                  # Verbose output
  lcc2json-dl --dry-run           # Show what would be downloaded

Depth levels:
  0 = Root classification scheme
  1 = Main classes (A-Z)
  2 = Subclass ranges (e.g., PR1-PR9680) [default]
  3 = Period/topic divisions (e.g., PR6050-PR6076)
  4 = Alphabetical ranges (e.g., PR6066.A-PR6066.Z)
  5+ = Individual entries (e.g., PR6066.A84)

Output JSON script help:

$ lcc2json --help
usage: lcc2json [-h] [-i INPUT_DIR] [-o OUTPUT] [-v] [--ranges]

Extract LCC outlines from downloaded JSON files.

options:
  -h, --help            show this help message and exit
  -i, --input-dir INPUT_DIR
                        Directory containing JSON files (default: json)
  -o, --output OUTPUT   Output file path (default: lcc.json)
  -v, --verbose         Enable verbose output
  --ranges              Include start/stop/prefix range fields in output (larger file size)

JSON Data

Depth 1

  • 21 files.
  • ~5 second download.
  • 215K size.
  • 21 classification entries.

Depth 2

  • 698 files.
  • 2 minute download.
  • 14M size.
  • 14,786 classification entries.
  • 516 unique prefixes.

Depth 3

  • 14,581 files.
  • 2 hour download.
  • 161M size.
  • 101,699 classification entries.

Depth 4

  • 100,551 files.
  • 14 hour download.
  • 824M size.
  • 344,073 classification entries.
  • Two missing (404) files.

Downloads

JSON data snapshots are also available at this URL, so, optionally you don't have to download with this script:

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Copyright © 2025 Jeff Moe

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lcc2json-1.2.0.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lcc2json-1.2.0-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file lcc2json-1.2.0.tar.gz.

File metadata

  • Download URL: lcc2json-1.2.0.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for lcc2json-1.2.0.tar.gz
Algorithm Hash digest
SHA256 3c2c38a3a878d4844b5cb5a492d073edc5b4bf6b401e0411f89bf18de54e6f5a
MD5 55ac53ba0b3b1d9677fca072f28269cc
BLAKE2b-256 22de57ab035f3b20c3ba4ae24f1fb7352d9d63046d34fd81e06b25b4e063dd95

See more details on using hashes here.

File details

Details for the file lcc2json-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: lcc2json-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for lcc2json-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b5bd0c799c05e2bd77e61a8f4eb7b8b5cf685f94c8ec691b8484ebf0d6166740
MD5 4e90d86cf80dc4c0d8487c1197e21cf7
BLAKE2b-256 d86a6f4a16b8fc37044aa65d00ea5c5268899e8a652aaced7efa5fea047d002c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page