A reporting system for Archivematica using data from AIPs.

These details have been verified by PyPI

Project links

Owner

Archivematica

GitHub Statistics

Maintainers

jhsimpson replaceafill

These details have not been verified by PyPI

Project description

About

AIPscan was developed to provide a more in-depth reporting solution for Archivematica users. It crawls METS files from AIPs in the Archivematica Storage Service to generate tabular and visual reports about repository holdings. It is designed to run as a stand-alone add-on to Archivematica. It only needs a valid Storage Service API key to fetch source data.

License

Apache License Version 2.0 Copyright Artefactual Systems Inc (2021)

Screenshots
Installation
Usage

Screenshots

AIPscan fetch job

screencap1

Finding an AIP

screencap2

Viewing an AIP

screencap3

Selecting a report

screencap4

Example: pie chart "format types" report

screencap5

Example: tabular "largest files" report

screencap6

Installation

AIPscan is a web-based application that is built using the Python Flask micro-framework. Below are the developer quickstart instructions. See INSTALL for production deployment instructions. See CONTRIBUTING for guidelines on how to contribute to the project, including how to create a new AIPscan report.

AIPscan Flask server

Install uv if it's not already available
Clone files and cd to directory: git clone https://github.com/artefactual-labs/AIPscan && cd AIPscan
Install all project dependencies (including development extras): uv sync
Bundle static assets: npm run build
Enable DEBUG mode if desired for development: export FLASK_CONFIG=dev
In a terminal window, start the Flask server: uv run python -m AIPscan.run
Confirm that the Flask server and AIPscan application are up and running at localhost:5000 in your browser.. You should see a blank AIPscan page like this:

screencap5

Typesense integration

AIPscan can optionally be run using Typesense as a report data source, potentially reducing the time to generate reports. If Typesense is installed and enabled then AIPscan data will be automatically indexed after each fetch job and report queries will pull data from Typesense rather than the application's database.

Typesense can be installed a variety of ways detailed on their website.

Configuration

Typesense configuration is done using the following environment variables:

Typesense API key (required): TYPESENSE_API_KEY
Typesense host: TYPESENSE_HOST (default "localhost")
Typesense port: TYPESENSE_PORT (default "8108")
Typesense URL protocol: TYPESENSE_PROTOCOL (default "http")
Typesense timeout (in seconds): TYPESENSE_TIMEOUT_SECONDS (default "30")
Typesense collection prefix: TYPESENSE_COLLECTION_PREFIX (default "aipscan_")

Typesense support is enabled by the setting of TYPESENSE_API_KEY.

Here's an example:

TYPESENSE_API_KEY="xOxOxOxO" python -m AIPscan.run

Related CLI tools

Two CLI tools exist to manually indexed AIPscan's database and to see a summary of the Typesense index.

Index AIPscan data: tools/index-refresh
Display a summary of the Typesense index: tools/index-summary

Background workers

Crawling and parsing many Archivematica AIP METS xml files at a time is resource intensive. Therefore, AIPscan uses the RabbitMQ message broker and the Celery task manager to coordinate this activity as background worker tasks. Both RabbitMQ and Celery must be running properly before attempting a METS fetch job.

RabbitMQ

You can downnload and install RabbitMQ server directly on your local or cloud machine or you can run it in either location from a Docker container.

Docker installation

docker run --rm \
  -it \
  --hostname my-rabbit \
  -p 15672:15672 \
  -p 5672:5672 rabbitmq:3-management

Download and install

Download RabbitMQ installer.

In another terminal window, start RabbitMQ queue manager:

export PATH=$PATH:/usr/local/sbin
sudo rabbitmq-server

RabbitMQ dashboard

The RabbitMQ dashboard is available at http://localhost:15672/
username: guest / password: guest
AIPscan connects to the RabbitMQ queue on port :5672.

Celery

Celery is installed when you run uv sync.

To start up Celery workers that are ready to receive tasks from RabbitMQ:

Open a new terminal tab or window.
Navigate to the AIPscan root project directory.
Install dependencies if you have not already (uv sync).
Enter the following command: uv run celery -A AIPscan.worker.celery worker --loglevel=info
You should see terminal output similar to this to indicate that the Celery task queue is ready:

screencap6

Development

Requires Docker CE and Docker Compose.

Clone the repository and go to its directory:

git clone https://github.com/artefactual-labs/AIPscan
cd AIPscan

Build images, initialize services, etc.:

docker-compose up -d

Optional: attach AIPscan to the Docker Archivematica container network directly:

docker-compose -f docker-compose.yml -f docker-compose.am-network.yml up -d

In this case, the AIPscan Storage Service record's URL field can be set with the Storage Service container name:

http://archivematica-storage-service:8000

Access the logs:

docker-compose logs -f aipscan rabbitmq celery-worker

Shut down the AIPscan Docker containers:

docker-compose down

Shut down the AIPscan Docker containers and remove the rabbitmq volumes:

docker-composer down --volumes

Production deployments

For production deployments, it's recommended to use MySQL instead of SQLite. This can be achieved by exporting an environment variable named SQLALCHEMY_DATABASE_URI for celery and AIPscan services, that points to MySQL using the format mysql+pymysql://user:pass@host/db.

When the SQLALCHEMY_DATABASE_URI environment variable is set the value of it will be output during startup of both AIPscan and Celery workers.

SQLite databases can be migrated using sqlite3mysql:

uv tool install sqlite3-to-mysql
sqlite3mysql -f aipscan.db -d <mysql database name> -u<mysql database user> ----mysql-password <mysql database password>

Tools

The tools directory contains scripts that can be run by developers and system adminsitrators.

Test data generator

The test data generator, tools/generate-test-data, tool populates AIPscan's databse with randomly generated example data.

Fetch script

The AIP fetch tool, tools/fetch_aips, allows all, or a subset, of a storage service's packages to be fetched by AIPscan. Any AIPs not yet fetched by AIPscan will be added but no duplicates will be added if an AIP has already been fetched. Any AIPs that have been newly marked as deleted will be removed from AIPscan.

When using the script the storage service's list of packages can optionally be grouped into "pages" with each "page" containing a number of packages (specified by a command-line argument). So, for example, packages on a storage service with 150 packages on it could be fetched by fetching three pages of 50 packages. Likewise if the storage service has anything from 101 to 149 packages on it it could also be fetched by fetching three pages of 50 packages.

If using cron, or some other scheduler, to automatically fetch AIPs using this tool consider using the --lockfile option to prevent overlapping executions of the tool.

Cached package list

A storage service's list of packages is downloaded by the script and is cached so paging, if used, will remain consistent between script runs. The cache of a particular cached list of packages is identified by a "session descriptor". A session descriptor is specified by whoever runs the script and can be any alphanumeric identifier without spaces or special characters. It's used to name the directory in which fetch-related files are created.

Below is what the directory structure would end up looking like if the session identifier "somedescriptor" was used, showing where the packages.json file, containing the list of a storage service's packages, would be put.

AIPscan/Aggregator/downloads/somedescriptor
├── mets
│   └── batch
└── packages
    └── packages.json

NOTE: Each run of the script will generate a new fetch job database entry. These individual fetch jobs shouldn't be deleted, via the AIPscan web UI, until all fetch jobs (for each "page") have run. Otherwise the cached list of packages will be deleted and the package list will have to be downloaded again.

Running tools

These should be run using the same system user and virtual environment that AIPscan is running under.

Here's how you would run the generate-test-data tool, for example:

cd <path to AIPscan base directory>
sudo -u <AIPscan system user> /bin/bash
source <path to AIPscan virtual environment>/bin/activate
./tools/generate-test-data

In order to display a tool's CLI arguments and options, enter <path to tool> --help.

Database documentation generator

To generate database documentation, using Schemaspy run via Docker, enter the following:

sudo make schema-docs

Database documentation will be output to the output directory and viewable by a web browser by opening index.html.

Usage

Ensure that the Flask Server, RabbitMQ server, and Celery worker queue are up and running.
Go to localhost:5000 in your browser.
Select "New Storage Service"
Add an Archivematica Storage Service record, including API Key, eg. https://amdemo.artefactual.com:8000
Select "New Fetch Job"
Check the black and green terminal to confirm that AIPscan successfully connected to the Archivematica Storage Service, that it received the lists of available packages from Archivematica, and that it has begun downloading and parsing the AIP METS files.
This could take a while (i.e. a few hours) depending on the total number of AIPs in your Storage Service and the size of your METS xml files. Therefore, if you have the option, it is recommended that you test AIPscan on a smaller subset of your full AIP holdings first. This should help you estimate the total time to run AIPscan against all packages in your Storage Service.
When the Fetch Job completes, select "View AIPs" button, "AIPs" menu, or "Reports" menu to view all the interesting information about your Archivematica content in a variety of layouts.

Project details

These details have been verified by PyPI

Project links

Owner

Archivematica

GitHub Statistics

Maintainers

jhsimpson replaceafill

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.9.0a12 pre-release

May 13, 2026

0.9.0a11 pre-release

Dec 1, 2025

0.9.0a10 pre-release

Oct 30, 2025

0.9.0a9 pre-release

Oct 21, 2025

0.9.0a8 pre-release

Oct 19, 2025

0.9.0a7 pre-release

Oct 8, 2025

0.9.0a6 pre-release

Oct 8, 2025

0.9.0a5 pre-release

Oct 3, 2025

0.9.0a4 pre-release

Oct 3, 2025

0.9.0a3 pre-release

Sep 29, 2025

0.9.0a2 pre-release

Sep 29, 2025

0.9.0a1 pre-release

Sep 25, 2025

This version

0.9.0.dev10 pre-release

Oct 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aipscan-0.9.0.dev10.tar.gz (189.3 kB view details)

Uploaded Oct 3, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aipscan-0.9.0.dev10-py3-none-any.whl (2.8 MB view details)

Uploaded Oct 3, 2025 Python 3

File details

Details for the file aipscan-0.9.0.dev10.tar.gz.

File metadata

Download URL: aipscan-0.9.0.dev10.tar.gz
Upload date: Oct 3, 2025
Size: 189.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for aipscan-0.9.0.dev10.tar.gz
Algorithm	Hash digest
SHA256	`ab980a2dade54dc2b7783006e4ee178ec16e1da3ca87365c12d59452344b2582`
MD5	`dafd6588400a3518f816db39e5cfe845`
BLAKE2b-256	`e756b3e359fc14306a6b06a73df2e68bc3d26d9af03d6b3e7bf1a52e05783db4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for aipscan-0.9.0.dev10.tar.gz:

Publisher: release.yml on artefactual-labs/AIPscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: aipscan-0.9.0.dev10.tar.gz
- Subject digest: ab980a2dade54dc2b7783006e4ee178ec16e1da3ca87365c12d59452344b2582
- Sigstore transparency entry: 582302327
- Sigstore integration time: Oct 3, 2025
Source repository:
- Permalink: artefactual-labs/AIPscan@0238ff1ba73779489277983d019ece92143dce93
- Branch / Tag: refs/heads/dev/release-improvements
- Owner: https://github.com/artefactual-labs
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@0238ff1ba73779489277983d019ece92143dce93
- Trigger Event: workflow_dispatch

File details

Details for the file aipscan-0.9.0.dev10-py3-none-any.whl.

File metadata

Download URL: aipscan-0.9.0.dev10-py3-none-any.whl
Upload date: Oct 3, 2025
Size: 2.8 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for aipscan-0.9.0.dev10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`21b02149753d3aebea8484a2cf25fecc313da998912c0b7c303e1c90f19c3d90`
MD5	`c580b3142eba4c17cb8c1f8991bfb81b`
BLAKE2b-256	`f13c8646bea6a422d9668fe672c569be1bac697a6faf0079be3263789679a76d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for aipscan-0.9.0.dev10-py3-none-any.whl:

Publisher: release.yml on artefactual-labs/AIPscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: aipscan-0.9.0.dev10-py3-none-any.whl
- Subject digest: 21b02149753d3aebea8484a2cf25fecc313da998912c0b7c303e1c90f19c3d90
- Sigstore transparency entry: 582302330
- Sigstore integration time: Oct 3, 2025
Source repository:
- Permalink: artefactual-labs/AIPscan@0238ff1ba73779489277983d019ece92143dce93
- Branch / Tag: refs/heads/dev/release-improvements
- Owner: https://github.com/artefactual-labs
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@0238ff1ba73779489277983d019ece92143dce93
- Trigger Event: workflow_dispatch

aipscan 0.9.0.dev10

Navigation

Verified details

Project links

Owner

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

About

License

Contents

Screenshots

AIPscan fetch job

Finding an AIP

Viewing an AIP

Selecting a report

Example: pie chart "format types" report

Example: tabular "largest files" report

Installation

AIPscan Flask server

Typesense integration

Configuration

Related CLI tools

Background workers

RabbitMQ

Docker installation

Download and install

RabbitMQ dashboard

Celery

Development

Production deployments

Tools

Test data generator

Fetch script

Cached package list

Running tools

Database documentation generator

Usage

Project details

Verified details

Project links

Owner

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance