Skip to main content

A Python library to interact with the Archive-It's API

Project description

$\color{Red}\Huge{\textsf{🚨THIS LIBRARY IS UNDER ACTIVE DEVELOPMENT. USE AT YOUR OWN RISK.🚨}}$

📦 Pyarchiveit

Pyarchiveit is a Python library designed to interact with the Internet Archive's Archive-it API. It provides a simple interface to manage the seeds and collections within Archive-it accounts.

✨ Features

  • Create and update seeds with metadata validation
  • Retrieve seed lists with their metadata for single or multiple collections

📥 Installation

You can install the library using pip:

pip install pyarchiveit

Or use uv if you have it installed:

uv add pyarchiveit

💡 Example usage

First, you will need to initialize the Archive-it API client with your account credentials.

from pyarchiveit import ArchiveItAPI

# Initialize the Archive-it API client with your credentials
archive_it_client = ArchiveItAPI(
    account_name='your_username',
    account_password='your_password'
)

To create a new seed with metadata, or update an existing seed's metadata, you can use the following code:

# Create a new seed with metadata
metadata = [
    {"value": "Example Metadata 1"},
    {"value": "Example Metadata 2"}
]
new_seed = archive_it_client.create_seed(
    collection_id=123456,
    url='http://example.com',
    crawl_definition_id=789012,
    other_params=None,
    metadata=metadata
)

To update an existing seed's metadata:

# Update an existing seed's metadata
updated_metadata = [
    {"value": "Updated Metadata 1"},
    {"value": "Updated Metadata 2"}
]
updated_seed = archive_it_client.update_seed_metadata(
    seed_id=123456,
    metadata=updated_metadata
)

To retrieve the seed list of a collection or multiple collections:

# Get seed list of a collection
seeds = archive_it_client.get_seeds(collection_ids=123456)

# Or get seeds from multiple collections
seeds = archive_it_client.get_seeds(collection_ids=[123456, 789012])

⚫ Issues

For questions or support, please open an issue on the GitHub repository.

🖊️ Author

Ken Lui - Data Curation Specialist at Map & Data Library, University of Toronto

📄 License

This project is licensed under the GNU GPLv3 - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyarchiveit-0.2.0.tar.gz (4.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyarchiveit-0.2.0-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file pyarchiveit-0.2.0.tar.gz.

File metadata

  • Download URL: pyarchiveit-0.2.0.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.7

File hashes

Hashes for pyarchiveit-0.2.0.tar.gz
Algorithm Hash digest
SHA256 ba2183e934f6650eb23107448f4e67e57f1f4b51282f75280be2244873071193
MD5 66d6b062e23db16bccdb73e3f736c8c3
BLAKE2b-256 40e1051ef1775098e4d1c321bac85cd86ac4e1a8ec8599c2fc6958ac3456e340

See more details on using hashes here.

File details

Details for the file pyarchiveit-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pyarchiveit-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3551880a726f59afa76357a7e138ccf57a7b4f726971c82ecddbc175f62e7d6d
MD5 1d23d5c978505234cd121af5f3c1084c
BLAKE2b-256 0beb84d6f9ac45218ffc92fa74809e26e3ee8ef9950f5af22a7bce84d98360d6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page