Tools for for interacting with Archive-It.

These details have not been verified by PyPI

Project links

Homepage

Project description

AIU

AIU is a Python library for extracting information from web archive collections. The work is done through different classes, each specific to a different web archive collection host. Each class performs screen-scraping and API analysis (if available) in order to acquire general collection metadata, seed lists, and seed metadata.

Installation

This package requires Python 3 and is called aiu on PyPI. Installation is handled via pip:

pip install aiu

Using the `ArchiveItCollection` class

The class named ArchiveItCollection has many methods for extracting information about an Archive-It collection using its collection identifier.

For example, to use iPython to get information about Archive-It collection number 5728, one can execute the following:

In [1]: from aiu import ArchiveItCollection

In [2]: aic = ArchiveItCollection(5728)

In [3]: aic.get_collection_name()
Out[3]: 'Social Media'

In [4]: aic.get_collectedby()
Out[4]: 'Willamette University'

In [5]: aic.get_description()
Out[5]: 'Social media content created by Willamette University.'

In [6]: aic.get_collection_uri()
Out[6]: 'https://archive-it.org/collections/5728'

In [7]: aic.get_archived_since()
Out[7]: 'Apr, 2015'

In [8]: aic.is_private()
Out[8]: False

In [9]: len(aic.list_seed_uris())
Out[9]: 113

In [10]: aic.list_seed_uris()[0]
Out[10]: 'http://blog.willamette.edu/mba/'

In [11]: seed_url = aic.list_seed_uris()[0]

In [12]: aic.get_seed_metadata(seed_url)
Out[12]:
{'collection_web_pages': [{'title': 'Willamette MBA Blog',  
   'description': ['Blog for the Willamette University Atkinson Graduate School of Management']}]}

From this session we now know that the collection's name is Social Media, it was collected by Willamette University, it has been archived since April 2015, it is not private, and it has 113 seeds.

Examine the source in aiu/archiveit_collection.py for a full list of methods to use with this class.

Using the `TroveCollection` class

The class named TroveCollection has many methods for extracting information about a National Library of Australia (NLA) Trove collection using its collection identifier. Note: Because NLA has different collection policies than Archive-It, not all methods, or their outputs, are mirrored between TroveCollection and ArchiveItCollection.

For example, to use iPython to get information about Trove collection number 13742, one can execute the following:

In [1]: from aiu import TroveCollection

In [2]: tc = TroveCollection(13742)

In [3]: tc.get_collection_name()
Out[3]: 'Iconic Australian Brands'

In [4]: tc.get_collectedby()
Out[4]:
{'National Library of Australia': 'http://www.nla.gov.au/',
 'State Library of Queensland': 'http://www.slq.qld.gov.au/'}

In [5]: tc.get_archived_since()
Out[5]: 'Feb 2000'

In [6]: tc.get_archived_until()
Out[6]: 'Mar 2021'

In [7]: len(tc.list_seed_uris())
Out[7]: 64

In [8]: tc.get_breadcrumbs()
Out[8]: [0, 15023]

From this session we now know that the collection's name is Iconic Australian Brands, it was collected by National Library of Australia and State Library of Queensland, has been archived since Feb 2000, and contains mementos up to Mar 2021, it has 63 seeds, and is a subcollection of collections with identifiers of 0 and 15023 -- the breadcrumbs that lead to this collection.

Examine the source in aiu/trove_collection.py for a full list of methods to use with this class.

Using the `PandoraCollection` class

The class named PandoraCollection has many methods for extracting information about a National Library of Australia (NLA) Pandora collection using its collection identifier. Note: Because NLA has different collection policies than Archive-It, not all methods, or their outputs, are mirrored between TroveCollection and ArchiveItCollection and PandoraCollection.

For example, to use iPython to get information about Pandora collection number 12022, one can execute the following:

In [1]: from aiu import PandoraCollection

In [2]: pc = PandoraCollection(12022)

In [3]: pc.get_collection_name()
Out[3]: 'Fact sheets (Victoria. EPA Victoria) - Australian Internet Sites'

In [4]: pc.get_title_pages()
Out[4]:
{'136318': ('https://webarchive.nla.gov.au/tep/136318', 'Air'),
 '136347': ('https://webarchive.nla.gov.au/tep/136347',
  'How to reduce noise from your business'),
 '136317': ('https://webarchive.nla.gov.au/tep/136317', 'Land'),
 '136346': ('https://webarchive.nla.gov.au/tep/136346', 'Landfill gas'),
 '136314': ('https://webarchive.nla.gov.au/tep/136314', 'Litter'),
 '136316': ('https://webarchive.nla.gov.au/tep/136316',
  'Noise (EPA fact sheet)'),
 '136319': ('https://webarchive.nla.gov.au/tep/136319', 'Odour'),
 '136312': ('https://webarchive.nla.gov.au/tep/136312', 'Waste'),
 '136313': ('https://webarchive.nla.gov.au/tep/136313', 'Water')}

In [5]: len(pc.list_memento_urims())
Out[5]: 10

In [6]: pc.list_seed_uris()
Out[6]:
['http://www.epa.vic.gov.au/~/media/Publications/1465.pdf',
 'http://www.epa.vic.gov.au/~/media/Publications/1481.pdf',
 'http://www.epa.vic.gov.au/~/media/Publications/1466.pdf',
 'http://www.epa.vic.gov.au/~/media/Publications/1479.pdf',
 'http://www.epa.vic.gov.au/~/media/Publications/1486%201.pdf',
 'http://www.epa.vic.gov.au/~/media/Publications/1467.pdf',
 'http://www.epa.vic.gov.au/~/media/Publications/1468.pdf',
 'http://www.epa.vic.gov.au/~/media/Publications/1469.pdf',
 'http://www.epa.vic.gov.au/~/media/Publications/1470.pdf']

In [7]: pc.get_collectedby()
Out[7]: {'State Library of Victoria': 'http://www.slv.vic.gov.au/'}

Examine the source in aiu/pandora_collection.py for a full list of methods to use with this class.

Using the `PandoraSubject` class

The class named PandoraSubject has many methods for extracting information about a National Library of Australia (NLA) Pandora subject using its subject identifier. Note: Because NLA has different collection policies than Archive-It, not all methods, or their outputs, are mirrored between TroveCollection and ArchiveItCollection and PandoraCollection and PandoraSubject.

For example, to use iPython to get information about Pandora subject number 83, one can execute the following:

In [1]: from aiu import PandoraSubject

In [2]: ps = PandoraSubject(83)

In [3]: ps.get_subject_name()
Out[3]: 'Humanities'

In [4]: len(ps.get_title_pages())
Out[4]: 71

In [5]: len(ps.list_memento_urims())
Out[5]: 246

In [6]: len(ps.list_seed_uris())
Out[6]: 71

In [7]: ps.subject_id
Out[7]: '83'

In [8]: ps.get_collectedby()
Out[8]:
{'National Library of Australia': 'http://www.nla.gov.au/',
 'Australian Institute of Aboriginal and Torres Strait Islander Studies': 'http://www.aiatsis.gov.au',
 'State Library of New South Wales': 'http://www.sl.nsw.gov.au/',
 'State Library of Victoria': 'http://www.slv.vic.gov.au/',
 'State Library of Western Australia': 'http://www.slwa.wa.gov.au/',
 'State Library of South Australia': 'http://www.slsa.sa.gov.au/'}

In [9]: ps.list_subcategories()
Out[9]: ['84', '85', '86']

Examine the source in aiu/pandora_collection.py for a full list of methods to use with this class.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.2.4

Nov 5, 2021

0.2.3

May 6, 2021

0.2.1

Apr 13, 2021

0.2.0

Feb 28, 2021

0.1.1a6 pre-release

Jan 14, 2021

0.1.1a4 pre-release

Mar 3, 2020

0.1.1a1 pre-release

Jul 20, 2018

0.1.0a2 pre-release

Jul 2, 2018

0.1.0a1 pre-release

Jun 28, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aiu-0.2.4.tar.gz (26.7 kB view details)

Uploaded Nov 5, 2021 Source

Built Distribution

aiu-0.2.4-py3-none-any.whl (33.4 kB view details)

Uploaded Nov 5, 2021 Python 3

File details

Details for the file aiu-0.2.4.tar.gz.

File metadata

Download URL: aiu-0.2.4.tar.gz
Upload date: Nov 5, 2021
Size: 26.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.10

File hashes

Hashes for aiu-0.2.4.tar.gz
Algorithm	Hash digest
SHA256	`9d11e9b872829efe112fdf5ec1055c932172420bae59529aaa36c6e57c60f8a5`
MD5	`84fddc30c9c2d3936daf03a6c2e23905`
BLAKE2b-256	`fac9f71d475511e343100ea8aaf0dca6236d0b152407133b09382399d27ae518`

See more details on using hashes here.

File details

Details for the file aiu-0.2.4-py3-none-any.whl.

File metadata

Download URL: aiu-0.2.4-py3-none-any.whl
Upload date: Nov 5, 2021
Size: 33.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.58.0 CPython/3.8.10

File hashes

Hashes for aiu-0.2.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`00e7e81ec84cfaf012c478afda657bfca881ceafe10b5466f19bb91ccac369ba`
MD5	`0efa953f4c8a17e483edb75c6a275504`
BLAKE2b-256	`fd99e8b52fd3deacce04c074e0f97acf567d0d1aa7f1c1b139c332a2832c5e8f`

See more details on using hashes here.

aiu 0.2.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AIU

Installation

Using the `ArchiveItCollection` class

Using the `TroveCollection` class

Using the `PandoraCollection` class

Using the `PandoraSubject` class

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

aiu 0.2.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AIU

Installation

Using the ArchiveItCollection class

Using the TroveCollection class

Using the PandoraCollection class

Using the PandoraSubject class

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Using the `ArchiveItCollection` class

Using the `TroveCollection` class

Using the `PandoraCollection` class

Using the `PandoraSubject` class