artsleuth

Computational Art Analysis Framework — Brushstroke analysis, style attribution, anomaly screening, and interpretable visual explanations powered by vision transformers.

These details have not been verified by PyPI

Project links

Project description

✦ About

ArtSleuth is a computational art-analysis framework that formalises what connoisseurs have done for centuries — examining the physical evidence a painter leaves on a canvas — using machine learning.

Brushstroke directionality, impasto relief, palette temperature, the habitual gestures that reside in the least-scrutinised passages of a painting — drapery folds, background foliage, the rendering of earlobes. These are the signals that distinguish one hand from another, and they map naturally onto what self-supervised vision transformers learn to encode.

	Capability	Method
	Stroke orientation, coherence, energy, curvature with patch-level clustering	Structure tensor on image gradients + DINOv2 patch embeddings
	Period, school, and genre prediction	CLIP embeddings through learned linear heads (period and genre pretrained; school randomly initialised)
	Embedding-space comparison with temporal plausibility scoring	Cosine similarity with GP-based date estimation
	Bayesian inference of distinct hands in collaborative paintings	Dirichlet process Gaussian mixture model
	One-class anomaly scoring with adversarial robustness testing	Mahalanobis distance plus historical forgery simulation
	Complementary features from two vision transformers	Concatenation at inference; cross-attention available for training
	Models how an artist's style evolves over decades	Gaussian process regression in embedding space (requires user-supplied dated references)
	Visual heatmaps highlighting regions the model considers salient	Gradient-based saliency maps

✦ What's Novel

ArtSleuth combines several techniques that are typically studied in isolation:

Style-Guided Cross-Attention Fusion — CLIP's semantic understanding directs DINOv2's patch-level attention via multi-head cross-attention with learned temperature, producing fused features neither backbone achieves alone.
Temporal Style Drift Modelling — Gaussian process regression over time-stamped reference embeddings captures how an artist's hand evolves across decades, reporting temporal plausibility as a separate signal. Requires user-supplied dated references; no bundled data is shipped.
Hierarchical Workshop Decomposition — A Dirichlet process Gaussian mixture model automatically infers the number of distinct hands in a painting, replacing flat k-means with art-historically grounded probabilistic clustering.
Adversarial Forgery Robustness — Stress-tests detection against simulated historical forgery techniques (artificial aging, style transfer perturbation, material anachronism) at multiple severity levels.

✦ Quick Start

No installation required? Try ArtSleuth live on HuggingFace Spaces. Pretrained weights are on the HuggingFace Hub.

pip install artsleuth

Python

import artsleuth

result = artsleuth.analyze("judith_slaying_holofernes.jpg")
print(result.summary())

explanation = result.explain()
explanation.save("analysis_overlay.png")

CLI

artsleuth analyze painting.jpg
artsleuth style painting.jpg --top-k 5
artsleuth compare painting_a.jpg painting_b.jpg
artsleuth workshop painting.jpg
artsleuth robustness painting.jpg -r "Artemisia Gentileschi"
artsleuth benchmark --backbone dinov2 --backbone fusion
artsleuth demo

✦ Architecture

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1A2E48', 'primaryTextColor': '#F0F0F0', 'primaryBorderColor': '#9DC0D8', 'lineColor': '#9DC0D8', 'secondaryColor': '#1A2E48', 'tertiaryColor': '#1A2E48', 'edgeLabelBackground': '#0D1117', 'clusterBkg': '#0D1117', 'clusterBorder': '#9DC0D8', 'titleColor': '#9DC0D8'}}}%%

graph TD
    %% Temporal drift is optional (off by default; needs user TemporalRegistry data)
    Input["Artwork Image"] --> Resize["Resize · Crop · Normalise"]

    Resize --> Patches["Patch Extraction"]
    Resize --> FullImage["Full-Image Encoding"]

    Patches --> DINO["DINOv2"]
    FullImage --> CLIPEnc["CLIP"]

    DINO --> Brushstroke["Brushstroke Analysis"]
    DINO --> WorkshopNode["Workshop Decomposition"]
    CLIPEnc --> Style["Style Classification"]

    DINO --> Concat["Feature Concatenation"]
    CLIPEnc --> Concat

    Concat --> Attribution["Attribution Scoring"]
    Attribution -.->|optional| Temporal["Temporal Drift Model"]

    Concat --> Forgery["Forgery Detection"]
    Concat -.->|optional| Adversarial["Adversarial Robustness"]

    DINO --> Explain["Saliency Maps"]

    Forgery --> Report["Analysis Report"]
    Attribution --> Report
    Temporal -.-> Report
    Style --> Report
    Brushstroke --> Report
    WorkshopNode --> Report
    Explain --> Report

    Report --> WebUI["Web UI"]
    Report --> CLI["CLI"]
    Report --> MCP["MCP Server"]

_{The inference pipeline concatenates CLIP + DINOv2 embeddings. Cross-attention fusion is available as a training-time architecture (used in the benchmark fine-tuning) but is not part of the default inference path.}

Backbone	Strength	Used For
DINOv2	Fine-grained texture and structure	Brushstroke analysis · patch embeddings
CLIP	Semantic-stylistic understanding	Style classification · style embeddings
Concat	Complementary feature combination	Attribution · forgery detection

_{Default backbone sizes: DINOv2 ViT-B/14 + CLIP ViT-L/14. First run downloads ~1–2 GB of pretrained weights from HuggingFace Hub.}

✦ Benchmark

Linear probe and end-to-end evaluation on the full WikiArt dataset (81 444 images, 80/20 split, seed 42):

Backbone	Style Acc	Style F1	Artist Acc	Artist Top-5	Genre Acc
DINOv2 · ViT-B/14	57.5 %	0.553	64.7 %	90.9 %	71.0 %
CLIP · ViT-L/14	67.1 %	0.656	74.6 %	95.9 %	75.0 %
Fusion · frozen	65.0 %	0.633	71.0 %	94.2 %	74.2 %
Fusion · fine-tuned †	71.6 %	0.703	77.8 %	96.2 %	75.1 %
Fusion · e2e †	72.7 %	—	79.0 %	96.9 %	76.6 %

_{Top three rows: logistic-regression linear probes on frozen backbones (macro-averaged across 27 styles, 129 artists, 11 genres). Reproducible via benchmarks/wikiart.py. † Bottom two rows: separate training runs with partial backbone unfreezing (last 3 transformer blocks), multi-task CE + supervised contrastive loss, AdamW, mixed-precision (5 epochs, effective batch 64). Training code not included in this repository; these numbers are reported for context.}

Reproduce the frozen linear-probe benchmarks:

pip install artsleuth[benchmarks]
artsleuth benchmark --device cuda

Forgery detection validation (one-class authentication)

For each of 125 named artists (≥ 80 works, excluding the catch-all "Unknown Artist" category), we fit a Mahalanobis-distance reference model from 80 % of their authenticated WikiArt works, then test whether held-out genuine paintings score lower (closer to the reference distribution) than impostor paintings by other artists. ROC-AUC = 1.0 means perfect separation; 0.5 means chance.

	DINOv2 ViT-B/14	CLIP ViT-L/14	Fused (concat)
Mean AUC	0.873	0.958	0.897
Median AUC	0.895	0.970	0.918
AUC ≥ 0.95	28 / 125 artists	81 / 125 artists	36 / 125 artists
AUC ≥ 0.90	62 / 125 artists	113 / 125 artists	75 / 125 artists

Top 15 and bottom 5 by fused AUC:

Artist	Works	DINOv2	CLIP	Fused
Sam Francis	317	1.000	1.000	1.000
Antoine Blanchard	170	1.000	1.000	1.000
Gene Davis	155	1.000	1.000	1.000
Fra Angelico	167	0.995	0.990	0.998
Juan Gris	196	0.993	0.996	0.998
Frans Hals	176	0.992	0.999	0.997
Édouard Cortès	214	0.992	1.000	0.995
El Greco	159	0.981	1.000	0.993
Fernand Léger	223	0.987	0.976	0.993
Anthony van Dyck	163	0.985	1.000	0.992
Maxime Maufra	119	0.991	0.998	0.991
Joshua Reynolds	200	0.976	0.999	0.989
Henri Fantin-Latour	105	0.977	1.000	0.989
Ivan Aivazovsky	577	0.980	0.998	0.986
Gustave Moreau	83	0.965	1.000	0.983
…
Salvador Dalí	479	0.675	0.876	0.725
Vasily Polenov	225	0.684	0.926	0.723
Jacek Malczewski	91	0.659	0.961	0.709
Mikhail Vrubel	95	0.618	0.823	0.654
M. C. Escher	126	0.610	0.899	0.649

_{Mahalanobis-distance one-class classification on WikiArt (125 named artists, 80/20 split, equal genuine/impostor test sets, seed 42). Artists with distinctive visual signatures (El Greco, Fra Angelico, ukiyo-e prints) approach perfect separation; stylistically versatile artists (Dalí, Escher) are harder to model as a single distribution. Full per-artist results in artsleuth/benchmarks/forgery_validation_results.json.}

✦ Related Work & Honest Limitations

Automated art classification has a rich history, and ArtSleuth builds on the shoulders of work we want to acknowledge properly.

Prior art in style classification. Saleh & Elgammal (2016) were among the first to apply metric learning to large-scale art datasets. Tan et al. (2016) trained a ResNet-50 on WikiArt and reported ~54 % style accuracy; their subsequent ArtGAN work (Tan et al., 2018) improved this to ~58 % by leveraging generative training. Chu & Wu (2018) showed that Gram-matrix representations of neural style features could reach ~63 %. More recently, multi-phase patch-based strategies (Bani & Abu-Naser, 2023) have reported high accuracy, though typically on reduced class sets or with micro-averaged metrics that weight common styles more heavily.

Backbone representations. Our fusion approach is motivated by the observation — articulated clearly in recent work on style disentanglement (Jia et al., 2026) — that self-supervised models like DINOv2 (Oquab et al., 2024) and vision-language models like CLIP (Radford et al., 2021) encode fundamentally different aspects of visual style. DINOv2 captures texture and structure; CLIP captures semantic-categorical associations. Cross-attention lets each backbone inform the other, but we should note that this idea is closely related to multi-modal fusion strategies explored in VQA and image-text retrieval.

Workshop attribution. Computational connoisseurship traces back to Lyu et al. (2004), who applied wavelet statistics to distinguish Bruegel from his imitators, and to Johnson et al. (2008), who used canvas-thread analysis for Vermeer attribution. Our Dirichlet-process approach to workshop decomposition is more flexible than these hand-crafted pipelines but has not yet been validated on the expert-curated datasets those studies used.

Method	Style Acc	Artist Acc	Classes	Protocol
ResNet-50 (Tan et al., 2016)	54.5 %	56.5 %	27 / 23	WikiArt subset, weighted avg
ArtGAN (Tan et al., 2018)	58.0 %	—	27	WikiArt, GAN-augmented
Gram matrices (Chu & Wu, 2018)	63.0 %	—	27	WikiArt, micro avg
Deep ensemble (Manzoor et al., 2024)	68.6 %	—	27	WikiArt, stacking ensemble
ArtFusionNet (Kose & Guner, 2025)	99.0 %	—	3	WikiArt subset, 3 styles only
ArtSleuth Fusion · e2e	72.7 %	79.0 %	27 / 129	WikiArt full, 81k, macro avg

_{Numbers are taken from the respective publications. Direct comparison is difficult: studies differ in the number of classes, dataset splits, averaging methods (micro vs. macro), and whether test sets overlap with training data. We list the protocol details we could verify so readers can judge for themselves.}

Where we fall short — and we know it.

Compute-constrained training. Fine-tuning ran for 5 epochs on a single GPU with 16 GB VRAM. More epochs, larger effective batches, or higher-VRAM GPUs (A100, H100) would very likely improve the numbers. We chose to report what we could reproduce on accessible hardware rather than extrapolate.
Frozen-fusion underperformance. Our frozen cross-attention fusion (65.0 % style) actually trails bare CLIP (67.1 %). The fusion head needs gradient signal from task labels to learn a useful alignment — it does not help out of the box. We report this rather than hide it.
No standardised benchmark protocol. WikiArt classification has no single accepted evaluation protocol. Class counts, splits, and averaging methods vary between papers, which makes apples-to-apples comparison frustratingly difficult. Our numbers use macro-averaging, which is the most conservative choice (each of the 27 styles counts equally, regardless of how many images it contains). Papers that report micro-averaged or weighted scores will appear higher on the same data.
Forgery detection validated on embeddings, not on physical forgeries. We validated the one-class anomaly detector (Mahalanobis distance) on WikiArt across 125 named artists with ≥ 80 works. Mean ROC-AUC: 0.958 (CLIP), 0.897 (fused DINOv2 + CLIP), 0.873 (DINOv2 alone). Median fused AUC is 0.918; three artists reach perfect 1.000. Full per-artist results are in artsleuth/benchmarks/forgery_validation_results.json. However, this evaluates embedding-space separation between different artists — it does not test against actual physical forgeries authenticated by conservators, which is a harder and more practically relevant problem.
Workshop decomposition is unsupervised. The Dirichlet-process model infers "hands" from embedding clusters, but there is no ground-truth labelled dataset of workshop paintings with per-region hand annotations to validate against. Art-historical validation by domain experts is still needed.
School predictions are randomly initialised. Pretrained weights ship for period (27 WikiArt styles) and genre (11 WikiArt genres), but the school axis has no labelled training data yet. School predictions are therefore based on randomly initialised weights and should not be trusted until fine-tuned on an appropriate corpus.
Temporal drift requires dated references. The Gaussian-process date estimator only works for artists whose dated reference embeddings are in the registry. No bundled references are shipped; temporal estimation is disabled by default and has no effect until the user populates a TemporalRegistry.

We consider these open problems, not failures. Contributions that address any of them — especially expert-curated evaluation datasets — would strengthen the project considerably.

Full reference list

Bani, M. & Abu-Naser, S. S. (2023). Artistic style recognition: combining deep and shallow neural networks for painting classification. Mathematics, 11(22), 4564. doi:10.3390/math11224564
Berenson, B. (1902). The Study and Criticism of Italian Art. George Bell & Sons.
Blei, D. M. & Jordan, M. I. (2006). Variational inference for Dirichlet process mixtures. Bayesian Analysis, 1(1), 121–143. doi:10.1214/06-BA104
Caron, M. et al. (2021). Emerging properties in self-supervised vision transformers. ICCV. arXiv:2104.14294
Chu, W.-T. & Wu, Y.-L. (2018). Image style classification based on learnt deep correlation features. IEEE Trans. Multimedia, 20(9), 2491–2502. doi:10.1109/TMM.2018.2801718
Jia, Z., Zhang, J. & Zhou, J. (2026). StyleDecoupler: generalizable artistic style disentanglement. arXiv:2601.17697
Johnson, C. R. et al. (2008). Image processing for artist identification. IEEE Signal Processing Magazine, 25(4), 37–48. doi:10.1109/MSP.2008.923513
Jose, J. et al. (2025). DINOv2 meets text: a unified framework for image- and pixel-level vision-language alignment. CVPR. arXiv:2501.00564
Kose, U. & Guner, B. (2025). Enhancing artistic style classification through a novel ArtFusionNet framework. Scientific Reports, 15, 20087. doi:10.1038/s41598-025-04825-y (Note: evaluated on 3 style classes.)
Lyu, S., Rockmore, D. & Farid, H. (2004). A digital technique for art authentication. PNAS, 101(49), 17006–17010. doi:10.1073/pnas.0406398101
Manzoor, T. et al. (2024). Deep ensemble art style recognition. arXiv:2405.11675
Morelli, G. (1890). Italian Painters: Critical Studies of Their Works. John Murray.
Oquab, M. et al. (2024). DINOv2: Learning robust visual features without supervision. TMLR. arXiv:2304.07193
Radford, A. et al. (2021). Learning transferable visual models from natural language supervision. ICML. arXiv:2103.00020
Rasmussen, C. E. & Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press.
Saleh, B. & Elgammal, A. (2016). Large-scale classification of fine-art paintings. JOCCH, 8(4), 1–24. doi:10.1145/2801634
Selvaraju, R. R. et al. (2017). Grad-CAM: visual explanations from deep networks via gradient-based localization. ICCV. doi:10.1109/ICCV.2017.74
Tan, W. R. et al. (2016). Ceci n'est pas une pipe: a deep convolutional network for fine-art paintings classification. ICIP. doi:10.1109/ICIP.2016.7533051
Tan, W. R. et al. (2018). ArtGAN: artwork synthesis with conditional categorical GANs. IEEE Trans. Image Processing, 27(10), 4846–4860. doi:10.1109/TIP.2018.2845388
Vaswani, A. et al. (2017). Attention is all you need. NeurIPS. arXiv:1706.03762

✦ Web Demo

An interactive Gradio interface with five analysis tabs: full pipeline, side-by-side comparison, workshop decomposition, temporal dating, and benchmark dashboard.

pip install artsleuth[web]
artsleuth demo

Or try the live demo on HuggingFace Spaces — no installation required. The GitHub Pages landing page provides an immersive overview of the framework.

✦ MCP Server

ArtSleuth ships as an MCP server, enabling AI assistants to perform art analysis conversationally.

artsleuth server

Tool	Description
`analyze_artwork`	Full analysis pipeline
`classify_style`	Period, school, genre classification
`compare_works`	Side-by-side stylistic comparison
`detect_anomalies`	Forgery screening against a reference corpus

Claude Desktop configuration

{
  "mcpServers": {
    "artsleuth": {
      "command": "artsleuth",
      "args": ["server"]
    }
  }
}

✦ Repository Structure

ArtSleuth/
├── artsleuth/
│   ├── core/                  # Analysis engines
│   │   ├── brushstroke.py     #   Brushstroke pattern extraction
│   │   ├── style.py           #   Style classification
│   │   ├── attribution.py     #   Artist attribution scoring
│   │   ├── forgery.py         #   Anomaly-based forgery detection
│   │   ├── explainability.py  #   Gradient saliency overlays
│   │   ├── temporal.py        #   Temporal style drift (GP)
│   │   ├── workshop.py        #   Bayesian workshop decomposition
│   │   ├── adversarial.py     #   Adversarial robustness testing
│   │   └── pipeline.py        #   Unified analysis orchestrator
│   ├── models/                # Backbone & head architectures
│   │   ├── backbones.py       #   DINOv2 & CLIP loaders
│   │   ├── fusion.py          #   Cross-attention backbone fusion
│   │   ├── heads.py           #   Task-specific linear heads
│   │   └── registry.py        #   HuggingFace model registry
│   ├── preprocessing/         # Art-specific transforms
│   │   ├── transforms.py      #   Varnish, crack, canvas correction
│   │   └── patches.py         #   Grid, salient, adaptive extraction
│   ├── benchmarks/            # Evaluation suite
│   │   ├── wikiart.py         #   WikiArt dataset + linear probes
│   │   └── evaluate.py        #   Multi-backbone comparison runner
│   ├── mcp/                   # MCP server
│   │   └── server.py          #   Tool definitions & handlers
│   ├── cli/                   # Command-line interface
│   │   └── main.py            #   Click-based CLI
│   └── utils/                 # Shared utilities
│       ├── visualization.py   #   Publication-quality figures
│       └── io.py              #   Image loading & saving
├── web/                       # Gradio web demo
│   ├── app.py                 #   Main application (5 tabs)
│   ├── theme.py               #   Custom ArtSleuth theme
│   └── components.py          #   Reusable UI builders
├── tests/                     # Pytest suite (9 test modules)
├── examples/                  # Jupyter notebooks
├── docs/                      # Methodology & guides
├── assets/                    # Visual assets
└── index.html                 # GitHub Pages landing site

✦ Development

git clone https://github.com/ladyFaye1998/ArtSleuth.git
cd ArtSleuth
pip install -e ".[all]"

pytest
ruff check .
mypy artsleuth

✦ Methodology

ArtSleuth draws on two traditions:

Art history — Giovanni Morelli's observation (1890) that an artist's most characteristic habits reside in the least-conscious passages. Bernard Berenson's refinement of this into systematic connoisseurship. The workshop-attribution methodology developed for the Gentileschi debate, where master and assistants each contribute recognisable passages to a shared canvas.

Computer science — Self-supervised vision transformers (Caron et al., 2021; Oquab et al., 2024) that learn rich visual features without task-specific labels. Contrastive vision-language models (Radford et al., 2021) that ground visual concepts in linguistic semantics. Cross-attention fusion (Vaswani et al., 2017; Jose et al., 2025) for multi-modal feature alignment. Dirichlet process mixtures (Blei & Jordan, 2006) for non-parametric clustering. Gaussian processes (Rasmussen & Williams, 2006) for temporal modelling.

The two complement each other: art history provides the questions; machine learning provides a scale of analysis that would be impractical by eye alone.

See docs/methodology.md for the full technical discussion.

✦ Citation

@software{lesin2026artsleuth,
  author    = {Lesin, Danielle},
  title     = {{ArtSleuth}: Computational Art Analysis Framework},
  year      = {2026},
  url       = {https://github.com/ladyFaye1998/ArtSleuth},
  license   = {MIT}
}

✦ Contributing

Contributions are welcome from art historians, ML researchers, conservators, and anyone interested in computational approaches to cultural heritage.

Area	What's Needed
Reference corpora	Curated, well-attributed image sets for specific artists or periods
Temporal references	Dated works for training the temporal style drift model
Model improvements	Better backbones, training strategies, evaluation benchmarks
Art-historical review	Ensuring taxonomy, terminology, and methodology stay sound
Web UI	Gradio component improvements, accessibility, visualisation refinements
Bug reports	Open an issue with reproduction steps

See CONTRIBUTING.md for guidelines.

_{Built with 🫖 by Danielle Lesin · Where connoisseurship meets computation}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.1

Mar 24, 2026

0.2.0

Mar 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

artsleuth-0.2.1.tar.gz (44.2 MB view details)

Uploaded Mar 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

artsleuth-0.2.1-py3-none-any.whl (200.7 kB view details)

Uploaded Mar 24, 2026 Python 3

File details

Details for the file artsleuth-0.2.1.tar.gz.

File metadata

Download URL: artsleuth-0.2.1.tar.gz
Upload date: Mar 24, 2026
Size: 44.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.3

File hashes

Hashes for artsleuth-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`99e4bdbe47a9bb71fc092e070ac4708117372483b1161509c6f4a70b6e857f40`
MD5	`a8e67c9102b3809d79ff04e7fe969256`
BLAKE2b-256	`a3766e8eedfe9ae30a81db0d9149356c0b243a770db07171c55cb6b6d6d51688`

See more details on using hashes here.

File details

Details for the file artsleuth-0.2.1-py3-none-any.whl.

File metadata

Download URL: artsleuth-0.2.1-py3-none-any.whl
Upload date: Mar 24, 2026
Size: 200.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.3

File hashes

Hashes for artsleuth-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3ab2c8af8d2b5681deda59710d8eab1a1f76c931df29e7ad379417a0d1d4c38c`
MD5	`636b8b75b07c08879fe853a76d16a099`
BLAKE2b-256	`caafb9bd1760e15c9d57e873a626703153b0a6ec22f1288ef49b1c278c0b3190`

See more details on using hashes here.

artsleuth 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

✦ About

✦ What's Novel

✦ Quick Start

✦ Architecture

✦ Benchmark

✦ Related Work & Honest Limitations

✦ Web Demo

✦ MCP Server

✦ Repository Structure

✦ Development

✦ Methodology

✦ Citation

✦ Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes