Skip to main content

Interpretable clustering and graph-based visualization of painting collections

Project description

museum-map

museum-map is a Python package for interpretable clustering and graph-based visualization of painting collections.

The package is designed for exploratory work with art collections, especially in contexts where users need not only a clustering result, but also a visual and inspectable representation of relationships between artworks. The current implementation combines CLIP-based image embeddings with explicitly defined visual descriptors for color distribution and spatial composition, projects the resulting feature space, detects clusters, and exports an interactive similarity graph with painting thumbnails.

The intended audience includes researchers in digital humanities, museum professionals, curators, collection managers, and computational analysts working with visual collections.

Installation

Install from PyPI:

pip install museum-map

Or install the latest development version from GitHub:

pip install git+https://github.com/Frantsuzova/museum-map.git

For local development:

git clone https://github.com/Frantsuzova/museum-map.git
cd museum-map
pip install -e .

What the package does

Given a folder of painting images, museum-map:

  • computes semantic image embeddings using a CLIP image encoder
  • extracts palette-based descriptors
  • extracts composition-based descriptors
  • combines these signals into a shared representation
  • detects clusters in the collection
  • selects representative paintings from each cluster
  • builds an interactive HTML graph where:
    • each node is a painting
    • node image = painting thumbnail
    • node border color = cluster membership
    • edges = local similarity relationships
  • exports intermediate artifacts and a ready-to-share zip archive

This workflow is intended to support exploratory analysis of collections and to help identify non-obvious relationships between paintings.

Quick start

Minimal Python usage

from museum_map import build_museum_map

pipeline = build_museum_map(
    input_dir="/path/to/paintings",
    output_dir="/path/to/output",
)

print(pipeline.graph_html_path_)
print(pipeline.export_zip_path_)

This is the simplest one-line workflow: point the package to a folder with images and receive an exported interactive result.

What input data can be used

The package currently expects a local folder of images.

Supported image extensions are:

  • .jpg
  • .jpeg
  • .png
  • .webp
  • .bmp
  • .tif
  • .tiff

Typical input scenarios

You can use:

  • a subset of a public art dataset such as WikiArt
  • a digitized museum collection exported as image files
  • a folder with paintings gathered for a pilot experiment
  • a thematic subcollection, for example portraits, landscapes, or one author’s works

Current assumptions about metadata

The current pipeline is image-first. It does not require metadata to run.

This means you can start with a plain folder of images:

my_collection/
├── painting_001.jpg
├── painting_002.jpg
├── painting_003.png
└── ...

If metadata such as artist, style, genre, accession number, or inventory ID is available, it can be integrated in future versions. In the current scaffold, unknown values are filled with placeholder labels.

How to prepare data for the module

The simplest workflow is:

  1. Create a folder containing only painting images.
  2. Pass the folder path as input_dir.
  3. Specify an output_dir where results should be written.

Example:

from museum_map import build_museum_map

build_museum_map(
    input_dir="./data/paintings",
    output_dir="./out/museum_map_run_01",
)

The package recursively scans the input directory and collects all supported images.

Example directory structure

project/
├── data/
│   └── paintings/
│       ├── aivazovsky_001.jpg
│       ├── monet_014.jpg
│       ├── shishkin_003.jpg
│       └── ...
└── out/

Then run:

from museum_map import build_museum_map

pipeline = build_museum_map(
    input_dir="project/data/paintings",
    output_dir="project/out/run_01",
)

Main output files

After execution, the output directory contains the computational artifacts and the interactive graph.

Typical output:

output_dir/
├── clip_embeddings.npy
├── palette_features.npy
├── composition_features.npy
├── feature_matrix.npy
├── umap_2d.npy
├── cluster_labels.npy
├── df_plot.csv
├── config.csv
├── similarity_graph.html
├── thumbs/
│   ├── thumb_0000.jpg
│   ├── thumb_0001.jpg
│   └── ...
└── museum_map_export.zip

What these files mean

  • clip_embeddings.npy — semantic image embeddings
  • palette_features.npy — color-based descriptors
  • composition_features.npy — composition descriptors
  • feature_matrix.npy — combined representation used for clustering
  • umap_2d.npy — low-dimensional projection of the collection
  • cluster_labels.npy — cluster assignments
  • df_plot.csv — metadata and coordinates per image
  • config.csv — run configuration
  • similarity_graph.html — interactive graph visualization
  • thumbs/ — thumbnails used inside the HTML graph
  • museum_map_export.zip — packaged result for local sharing or archiving

What the HTML output contains

The main visual artifact is similarity_graph.html.

It represents the collection as a graph:

  • nodes correspond to paintings
  • each node contains a painting thumbnail
  • node border color indicates cluster membership
  • edges connect paintings with strong local similarity
  • hover displays metadata such as artist, style, genre, cluster, and filename

This file can be opened locally in a browser. If some browsers restrict local file access for thumbnails, it can also be served from a lightweight local server.

Example:

python -m http.server 8000

Then open:

http://localhost:8000/similarity_graph.html

Configuration

The main entry point accepts optional keyword parameters that control the pipeline.

Example:

from museum_map import build_museum_map

pipeline = build_museum_map(
    input_dir="./data/paintings",
    output_dir="./out/run_02",
    batch_size=16,
    n_palette_colors=6,
    weight_clip=1.0,
    weight_palette=0.4,
    weight_composition=0.8,
    graph_k_neighbors=3,
    max_per_cluster_for_graph=25,
)

Important parameters include:

  • batch_size — embedding batch size
  • n_palette_colors — number of dominant palette colors
  • weight_clip — weight of CLIP embeddings in the combined representation
  • weight_palette — weight of palette descriptors
  • weight_composition — weight of composition descriptors
  • graph_k_neighbors — number of neighbors in the similarity graph
  • max_per_cluster_for_graph — representative sample size per cluster
  • thumb_size — thumbnail size used in the HTML graph

Current limitations

At the current stage, the package has several deliberate limitations:

  • it expects local image folders rather than remote datasets
  • it does not yet ingest structured metadata tables automatically
  • it is optimized for exploratory work rather than industrial-scale deployment
  • it does not yet include optional graph-embedding extensions such as Node2Vec
  • it currently exports the graph view as the main interactive artifact

These points are expected development directions rather than defects.

Roadmap

Planned next steps include:

  • command-line interface
  • optional metadata ingestion from CSV/JSON
  • optional museum map scatter export
  • cluster summaries and automatic cluster labels
  • graph-aware extensions such as Node2Vec
  • richer support for digital humanities and museum collection workflows
  • automated release workflow for PyPI

Minimal example

from museum_map import build_museum_map

pipeline = build_museum_map(
    input_dir="./paintings",
    output_dir="./museum_map_output",
)

print("Graph saved to:", pipeline.graph_html_path_)
print("ZIP saved to:", pipeline.export_zip_path_)

Citation

If you use this package in academic work, please cite the corresponding paper or software record. A suggested citation file is included as CITATION.cff.

License

This project is distributed under the MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

museum_map-0.1.2.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

museum_map-0.1.2-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file museum_map-0.1.2.tar.gz.

File metadata

  • Download URL: museum_map-0.1.2.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for museum_map-0.1.2.tar.gz
Algorithm Hash digest
SHA256 87e5082d19d5877f416f7a2a540a529d3e672ed60f877b5bc17a4173b1debcca
MD5 00ce9138feeff01d03bfe6da8b05f527
BLAKE2b-256 13c6270b60e02faa46b440a9165ef707dede14b0f71635c3aad5e7fb7a3975fe

See more details on using hashes here.

File details

Details for the file museum_map-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: museum_map-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for museum_map-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7c57c08cb3beabdfce0d0bf309f71a844d9bc22766de993e8a7f9a278d8672fd
MD5 a33ff024de5b765299fd17d06f05772a
BLAKE2b-256 2c09d21639f56d627f4b1dbacd6ee327b3137f827b120a7a15d097c53491215d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page