Skip to main content

Interpretable clustering and graph-based visualization of painting collections

Project description

museum-map

museum-map is a Python package for interpretable clustering and graph-based visualization of painting collections.

The package is designed for exploratory work with art collections, especially in contexts where users need not only a clustering result, but also a visual and inspectable representation of relationships between artworks. The current implementation combines CLIP-based image embeddings with explicitly defined visual descriptors for color distribution and spatial composition, projects the resulting feature space, detects clusters, and exports an interactive similarity graph with painting thumbnails.

The intended audience includes researchers in digital humanities, museum professionals, curators, collection managers, and computational analysts working with visual collections.

demo-results

Installation

Install from PyPI:

pip install museum-map

Or install the latest development version from GitHub:

pip install git+https://github.com/Frantsuzova/museum-map.git

For local development:

git clone https://github.com/Frantsuzova/museum-map.git
cd museum-map
pip install -e .

What the package does

Given a folder of painting images, museum-map:

  • computes semantic image embeddings using a CLIP image encoder
  • extracts palette-based descriptors
  • extracts composition-based descriptors
  • combines these signals into a shared representation
  • detects clusters in the collection
  • selects representative paintings from each cluster
  • builds an interactive HTML graph where:
    • each node is a painting
    • node image = painting thumbnail
    • node border color = cluster membership
    • edges = local similarity relationships
  • exports intermediate artifacts and a ready-to-share zip archive

This workflow is intended to support exploratory analysis of collections and to help identify non-obvious relationships between paintings.

Quick start

Minimal Python usage

from museum_map import build_museum_map

pipeline = build_museum_map(
    input_dir="/path/to/paintings",
    output_dir="/path/to/output",
)

print(pipeline.graph_html_path_)
print(pipeline.export_zip_path_)

This is the simplest one-line workflow: point the package to a folder with images and receive an exported interactive result.

What input data can be used

The package currently expects a local folder of images.

Supported image extensions are:

  • .jpg
  • .jpeg
  • .png
  • .webp
  • .bmp
  • .tif
  • .tiff

Typical input scenarios

You can use:

  • a subset of a public art dataset such as WikiArt
  • a digitized museum collection exported as image files
  • a folder with paintings gathered for a pilot experiment
  • a thematic subcollection, for example portraits, landscapes, or one author’s works

Current assumptions about metadata

The current pipeline is image-first. It does not require metadata to run.

This means you can start with a plain folder of images:

my_collection/
├── painting_001.jpg
├── painting_002.jpg
├── painting_003.png
└── ...

If metadata such as artist, style, genre, accession number, or inventory ID is available, it can be integrated in future versions. In the current scaffold, unknown values are filled with placeholder labels.

How to prepare data for the module

The simplest workflow is:

  1. Create a folder containing only painting images.
  2. Pass the folder path as input_dir.
  3. Specify an output_dir where results should be written.

Example:

from museum_map import build_museum_map

build_museum_map(
    input_dir="./data/paintings",
    output_dir="./out/museum_map_run_01",
)

The package recursively scans the input directory and collects all supported images.

Example directory structure

project/
├── data/
│   └── paintings/
│       ├── aivazovsky_001.jpg
│       ├── monet_014.jpg
│       ├── shishkin_003.jpg
│       └── ...
└── out/

Then run:

from museum_map import build_museum_map

pipeline = build_museum_map(
    input_dir="project/data/paintings",
    output_dir="project/out/run_01",
)

Main output files

After execution, the output directory contains the computational artifacts and the interactive graph.

Typical output:

output_dir/
├── clip_embeddings.npy
├── palette_features.npy
├── composition_features.npy
├── feature_matrix.npy
├── umap_2d.npy
├── cluster_labels.npy
├── df_plot.csv
├── config.csv
├── similarity_graph.html
├── thumbs/
│   ├── thumb_0000.jpg
│   ├── thumb_0001.jpg
│   └── ...
└── museum_map_export.zip

What these files mean

  • clip_embeddings.npy — semantic image embeddings
  • palette_features.npy — color-based descriptors
  • composition_features.npy — composition descriptors
  • feature_matrix.npy — combined representation used for clustering
  • umap_2d.npy — low-dimensional projection of the collection
  • cluster_labels.npy — cluster assignments
  • df_plot.csv — metadata and coordinates per image
  • config.csv — run configuration
  • similarity_graph.html — interactive graph visualization
  • thumbs/ — thumbnails used inside the HTML graph
  • museum_map_export.zip — packaged result for local sharing or archiving

What the HTML output contains

The main visual artifact is similarity_graph.html.

It represents the collection as a graph:

  • nodes correspond to paintings
  • each node contains a painting thumbnail
  • node border color indicates cluster membership
  • edges connect paintings with strong local similarity
  • hover displays metadata such as artist, style, genre, cluster, and filename

This file can be opened locally in a browser. If some browsers restrict local file access for thumbnails, it can also be served from a lightweight local server.

Example:

python -m http.server 8000

Then open:

http://localhost:8000/similarity_graph.html

Configuration

The main entry point accepts optional keyword parameters that control the pipeline.

Example:

from museum_map import build_museum_map

pipeline = build_museum_map(
    input_dir="./data/paintings",
    output_dir="./out/run_02",
    batch_size=16,
    n_palette_colors=6,
    weight_clip=1.0,
    weight_palette=0.4,
    weight_composition=0.8,
    graph_k_neighbors=3,
    max_per_cluster_for_graph=25,
)

Important parameters include:

  • batch_size — embedding batch size
  • n_palette_colors — number of dominant palette colors
  • weight_clip — weight of CLIP embeddings in the combined representation
  • weight_palette — weight of palette descriptors
  • weight_composition — weight of composition descriptors
  • graph_k_neighbors — number of neighbors in the similarity graph
  • max_per_cluster_for_graph — representative sample size per cluster
  • thumb_size — thumbnail size used in the HTML graph

Current limitations

At the current stage, the package has several deliberate limitations:

  • it expects local image folders rather than remote datasets
  • it does not yet ingest structured metadata tables automatically
  • it is optimized for exploratory work rather than industrial-scale deployment
  • it does not yet include optional graph-embedding extensions such as Node2Vec
  • it currently exports the graph view as the main interactive artifact

These points are expected development directions rather than defects.

Roadmap

Planned next steps include:

  • command-line interface
  • optional metadata ingestion from CSV/JSON
  • optional museum map scatter export
  • cluster summaries and automatic cluster labels
  • graph-aware extensions such as Node2Vec
  • richer support for digital humanities and museum collection workflows
  • automated release workflow for PyPI

Minimal example

from museum_map import build_museum_map

pipeline = build_museum_map(
    input_dir="./paintings",
    output_dir="./museum_map_output",
)

print("Graph saved to:", pipeline.graph_html_path_)
print("ZIP saved to:", pipeline.export_zip_path_)

Citation

If you use this package in academic work, please cite the corresponding paper or software record. A suggested citation file is included as CITATION.cff.

License

This project is distributed under the MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

museum_map-0.1.3.tar.gz (15.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

museum_map-0.1.3-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file museum_map-0.1.3.tar.gz.

File metadata

  • Download URL: museum_map-0.1.3.tar.gz
  • Upload date:
  • Size: 15.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for museum_map-0.1.3.tar.gz
Algorithm Hash digest
SHA256 54a25f9fe997da2c63f56d92ff490bf6aa967816a360f072e2806dc7c85744f9
MD5 d986b4fb7f4827d10e7379de9cb6bbda
BLAKE2b-256 25b8c047eda5dadd0c32e8c1a0ef32bc0c0b150905449d5d80e02e75d88fd660

See more details on using hashes here.

File details

Details for the file museum_map-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: museum_map-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 14.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for museum_map-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3a34ad1a8e968906180789cf4ac1e947a60550d64b7f5e86d2873f836e935edd
MD5 747cc36d86fa4b6ac2f62dedcca9a4ba
BLAKE2b-256 cc7c74e03dc4aa47723c4b6338543d06358fa34a938c40f881c34ce7ed6d8844

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page