Skip to main content

Concurrently retrieve metadata from Archive.org items.

Project description

Internet Archive data mining tools.

What Is IA Mine?

IA Mine is a command line tool and Python 3 library for data mining the Internet Archive.

Note: Documentation currently in progress.

How Do I Get Started?

Command Line Interface

The IA Mine command line tool should work on any Unix-like operating system that has Python 3 installed on it. To start using ia-mine, simply download one of the latest binaries from https://archive.org/details/iamine-pex.

# Download ia-mine and make it executable.
$ curl -L https://archive.org/download/iamine-pex/ia-mine-0.2.0.pex > ia-mine
$ chmod +x ia-mine
$ ./ia-mine -v
0.2.0

Python Library

The IA Mine Python library can be installed with pip:

# Create a Python 3 virtualenv, and install iamine.
$ virtualenv --python=python3 venv
$ . venv/bin/activate
$ pip install iamine

This will also install the ia-mine comand line tool into your virtualenv:

$ which ia-mine
/home/user/venv/bin/ia-mine

Data Mining with IA Mine and jq

ia-mine simply retrieves metadata and search results concurrently from Archive.org, and dumps the JSON returned to stdout and any error messages to stderr. Mining the JSON dumped to stdout can be done using a tool like jq, for example. jq binaries can be downloaded at http://stedolan.github.io/jq/download/.

ia-mine can mine Archive.org search results, the items returned from search results, or items provide via an itemlist or stdin.

Developers

Please report any bugs or issues on github: https://github.com/jjjake/iamine

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iamine-0.2.0.tar.gz (9.9 kB view details)

Uploaded Source

File details

Details for the file iamine-0.2.0.tar.gz.

File metadata

  • Download URL: iamine-0.2.0.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for iamine-0.2.0.tar.gz
Algorithm Hash digest
SHA256 c7482c5cfd545663541abd4b471399fdfa317b091d33e09a4aee00b2d5e0a711
MD5 d2dfef260ba32e3c6985be53b1bdf397
BLAKE2b-256 855017094ab4ded264ce9635e2e170f805c6aaa379b20ebe52428773b071a9dc

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page