Skip to main content
Donate to the Python Software Foundation or Purchase a PyCharm License to Benefit the PSF! Donate Now

Collect metadata for Internet Archive collections

Project description

iacoll

iacoll will collect all the item metadata for an Internet Archive collection and store it in a LevelDB database. The database is a key/value store where the key is the unique Internet Archive item identifier, and the value is the JSON for the item metadata.

For example you can download the metadata for items in the University of Maryland's collection:

% iacoll university_maryland_cp 

By default iacoll will create the LevelDB database in a directory named with the item identifier. If you would like to control this you can explicitly pass it:

% iacoll university_maryland_cp --db /path/to/my/leveldb/database

When you run iacoll repeatedly it will look at the database and only fetch newer records. If an update ever fails you may want to force a full scan:

% iacoll university_maryland_cp --fullscan

If you would like to dump the metadata as line oriented JSON you can use --dump:

% iacoll university_maryland_cp --dump > university_maryland_cp.jsonl

Install

To install iacoll you'll first need to install Python and then:

pip install iacoll

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
iacoll-0.0.3.tar.gz (3.0 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page