Interpretable clustering and graph-based visualization of painting collections
Project description
museum-map
museum-map is a Python package for interpretable clustering and graph-based visualization of painting collections.
The package is designed for exploratory work with art collections, especially in contexts where users need not only a clustering result, but also a visual and inspectable representation of relationships between artworks. The current implementation combines CLIP-based image embeddings with explicitly defined visual descriptors for color distribution and spatial composition, projects the resulting feature space, detects clusters, and exports an interactive similarity graph with painting thumbnails.
The intended audience includes researchers in digital humanities, museum professionals, curators, collection managers, and computational analysts working with visual collections.
Installation
Install from PyPI:
pip install museum-map
Or install the latest development version from GitHub:
pip install git+https://github.com/Frantsuzova/museum-map.git
For local development:
git clone https://github.com/Frantsuzova/museum-map.git
cd museum-map
pip install -e .
What the package does
Given a folder of painting images, museum-map:
- computes semantic image embeddings using a CLIP image encoder
- extracts palette-based descriptors
- extracts composition-based descriptors
- combines these signals into a shared representation
- detects clusters in the collection
- selects representative paintings from each cluster
- builds an interactive HTML graph where:
- each node is a painting
- node image = painting thumbnail
- node border color = cluster membership
- edges = local similarity relationships
- exports intermediate artifacts and a ready-to-share zip archive
This workflow is intended to support exploratory analysis of collections and to help identify non-obvious relationships between paintings.
Quick start
Minimal Python usage
from museum_map import build_museum_map
pipeline = build_museum_map(
input_dir="/path/to/paintings",
output_dir="/path/to/output",
)
print(pipeline.graph_html_path_)
print(pipeline.export_zip_path_)
This is the simplest one-line workflow: point the package to a folder with images and receive an exported interactive result.
What input data can be used
The package currently expects a local folder of images.
Supported image extensions are:
.jpg.jpeg.png.webp.bmp.tif.tiff
Typical input scenarios
You can use:
- a subset of a public art dataset such as WikiArt
- a digitized museum collection exported as image files
- a folder with paintings gathered for a pilot experiment
- a thematic subcollection, for example portraits, landscapes, or one author’s works
Current assumptions about metadata
The current pipeline is image-first. It does not require metadata to run.
This means you can start with a plain folder of images:
my_collection/
├── painting_001.jpg
├── painting_002.jpg
├── painting_003.png
└── ...
If metadata such as artist, style, genre, accession number, or inventory ID is available, it can be integrated in future versions. In the current scaffold, unknown values are filled with placeholder labels.
How to prepare data for the module
The simplest workflow is:
- Create a folder containing only painting images.
- Pass the folder path as
input_dir. - Specify an
output_dirwhere results should be written.
Example:
from museum_map import build_museum_map
build_museum_map(
input_dir="./data/paintings",
output_dir="./out/museum_map_run_01",
)
The package recursively scans the input directory and collects all supported images.
Example directory structure
project/
├── data/
│ └── paintings/
│ ├── aivazovsky_001.jpg
│ ├── monet_014.jpg
│ ├── shishkin_003.jpg
│ └── ...
└── out/
Then run:
from museum_map import build_museum_map
pipeline = build_museum_map(
input_dir="project/data/paintings",
output_dir="project/out/run_01",
)
Main output files
After execution, the output directory contains the computational artifacts and the interactive graph.
Typical output:
output_dir/
├── clip_embeddings.npy
├── palette_features.npy
├── composition_features.npy
├── feature_matrix.npy
├── umap_2d.npy
├── cluster_labels.npy
├── df_plot.csv
├── config.csv
├── similarity_graph.html
├── thumbs/
│ ├── thumb_0000.jpg
│ ├── thumb_0001.jpg
│ └── ...
└── museum_map_export.zip
What these files mean
clip_embeddings.npy— semantic image embeddingspalette_features.npy— color-based descriptorscomposition_features.npy— composition descriptorsfeature_matrix.npy— combined representation used for clusteringumap_2d.npy— low-dimensional projection of the collectioncluster_labels.npy— cluster assignmentsdf_plot.csv— metadata and coordinates per imageconfig.csv— run configurationsimilarity_graph.html— interactive graph visualizationthumbs/— thumbnails used inside the HTML graphmuseum_map_export.zip— packaged result for local sharing or archiving
What the HTML output contains
The main visual artifact is similarity_graph.html.
It represents the collection as a graph:
- nodes correspond to paintings
- each node contains a painting thumbnail
- node border color indicates cluster membership
- edges connect paintings with strong local similarity
- hover displays metadata such as artist, style, genre, cluster, and filename
This file can be opened locally in a browser. If some browsers restrict local file access for thumbnails, it can also be served from a lightweight local server.
Example:
python -m http.server 8000
Then open:
http://localhost:8000/similarity_graph.html
Configuration
The main entry point accepts optional keyword parameters that control the pipeline.
Example:
from museum_map import build_museum_map
pipeline = build_museum_map(
input_dir="./data/paintings",
output_dir="./out/run_02",
batch_size=16,
n_palette_colors=6,
weight_clip=1.0,
weight_palette=0.4,
weight_composition=0.8,
graph_k_neighbors=3,
max_per_cluster_for_graph=25,
)
Important parameters include:
batch_size— embedding batch sizen_palette_colors— number of dominant palette colorsweight_clip— weight of CLIP embeddings in the combined representationweight_palette— weight of palette descriptorsweight_composition— weight of composition descriptorsgraph_k_neighbors— number of neighbors in the similarity graphmax_per_cluster_for_graph— representative sample size per clusterthumb_size— thumbnail size used in the HTML graph
Current limitations
At the current stage, the package has several deliberate limitations:
- it expects local image folders rather than remote datasets
- it does not yet ingest structured metadata tables automatically
- it is optimized for exploratory work rather than industrial-scale deployment
- it does not yet include optional graph-embedding extensions such as Node2Vec
- it currently exports the graph view as the main interactive artifact
These points are expected development directions rather than defects.
Roadmap
Planned next steps include:
- command-line interface
- optional metadata ingestion from CSV/JSON
- optional museum map scatter export
- cluster summaries and automatic cluster labels
- graph-aware extensions such as Node2Vec
- richer support for digital humanities and museum collection workflows
- automated release workflow for PyPI
Minimal example
from museum_map import build_museum_map
pipeline = build_museum_map(
input_dir="./paintings",
output_dir="./museum_map_output",
)
print("Graph saved to:", pipeline.graph_html_path_)
print("ZIP saved to:", pipeline.export_zip_path_)
Citation
If you use this package in academic work, please cite the corresponding paper or software record. A suggested citation file is included as CITATION.cff.
License
This project is distributed under the MIT License. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file museum_map-0.1.3.tar.gz.
File metadata
- Download URL: museum_map-0.1.3.tar.gz
- Upload date:
- Size: 15.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
54a25f9fe997da2c63f56d92ff490bf6aa967816a360f072e2806dc7c85744f9
|
|
| MD5 |
d986b4fb7f4827d10e7379de9cb6bbda
|
|
| BLAKE2b-256 |
25b8c047eda5dadd0c32e8c1a0ef32bc0c0b150905449d5d80e02e75d88fd660
|
File details
Details for the file museum_map-0.1.3-py3-none-any.whl.
File metadata
- Download URL: museum_map-0.1.3-py3-none-any.whl
- Upload date:
- Size: 14.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a34ad1a8e968906180789cf4ac1e947a60550d64b7f5e86d2873f836e935edd
|
|
| MD5 |
747cc36d86fa4b6ac2f62dedcca9a4ba
|
|
| BLAKE2b-256 |
cc7c74e03dc4aa47723c4b6338543d06358fa34a938c40f881c34ce7ed6d8844
|