Skip to main content

Collect metadata for Internet Archive collections

Project description

iacoll

iacoll will collect all the item metadata for an Internet Archive collection and store it in a LevelDB database. The database is a key/value store where the key is the unique Internet Archive item identifier, and the value is the JSON for the item metadata.

For example you can download the metadata for items in the University of Maryland's collection:

% iacoll university_maryland_cp 

By default iacoll will create the LevelDB database in a directory named with the item identifier. If you would like to control this you can explicitly pass it:

% iacoll university_maryland_cp --db /path/to/my/leveldb/database

When you run iacoll repeatedly it will look at the database and only fetch newer records. If an update ever fails you may want to force a full scan:

% iacoll university_maryland_cp --fullscan

If you would like to dump the metadata as line oriented JSON you can use --dump:

% iacoll university_maryland_cp --dump > university_maryland_cp.jsonl

Install

To install iacoll you'll first need to install Python and then:

pip install iacoll

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iacoll-0.0.3.tar.gz (3.0 kB view details)

Uploaded Source

File details

Details for the file iacoll-0.0.3.tar.gz.

File metadata

  • Download URL: iacoll-0.0.3.tar.gz
  • Upload date:
  • Size: 3.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6

File hashes

Hashes for iacoll-0.0.3.tar.gz
Algorithm Hash digest
SHA256 498d1c0835004b964ad810c5b4ddf20ad12e13f9277edaf62bbd08cc3efc0a6c
MD5 18e38e89ab15eb02c8d4a0c7d965ef83
BLAKE2b-256 163027cbad2d8e338bf9930fd8cce0783ac263b73c7ded2ae820af4b86af1820

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page