Computational Art Analysis Framework — Brushstroke analysis, style attribution, anomaly screening, and interpretable visual explanations powered by vision transformers.
Project description
✦ About
ArtSleuth is a computational art-analysis framework that formalises what connoisseurs have done for centuries — examining the physical evidence a painter leaves on a canvas — using machine learning.
Brushstroke directionality, impasto relief, palette temperature, the habitual gestures that reside in the least-scrutinised passages of a painting — drapery folds, background foliage, the rendering of earlobes. These are the signals that distinguish one hand from another, and they map naturally onto what self-supervised vision transformers learn to encode.
| Capability | Method | |
|---|---|---|
| Stroke orientation, coherence, energy, curvature with patch-level clustering | Structure tensor on image gradients + DINOv2 patch embeddings | |
| Period, school, and genre prediction | CLIP embeddings through learned linear heads (period and genre pretrained; school randomly initialised) | |
| Embedding-space comparison with temporal plausibility scoring | Cosine similarity with GP-based date estimation | |
| Bayesian inference of distinct hands in collaborative paintings | Dirichlet process Gaussian mixture model | |
| One-class anomaly scoring with adversarial robustness testing | Mahalanobis distance plus historical forgery simulation | |
| Complementary features from two vision transformers | Concatenation at inference; cross-attention available for training | |
| Models how an artist's style evolves over decades | Gaussian process regression in embedding space (requires user-supplied dated references) | |
| Visual heatmaps highlighting regions the model considers salient | Gradient-based saliency maps |
✦ What's Novel
ArtSleuth combines several techniques that are typically studied in isolation:
-
Style-Guided Cross-Attention Fusion — CLIP's semantic understanding directs DINOv2's patch-level attention via multi-head cross-attention with learned temperature, producing fused features neither backbone achieves alone.
-
Temporal Style Drift Modelling — Gaussian process regression over time-stamped reference embeddings captures how an artist's hand evolves across decades, reporting temporal plausibility as a separate signal. Requires user-supplied dated references; no bundled data is shipped.
-
Hierarchical Workshop Decomposition — A Dirichlet process Gaussian mixture model automatically infers the number of distinct hands in a painting, replacing flat k-means with art-historically grounded probabilistic clustering.
-
Adversarial Forgery Robustness — Stress-tests detection against simulated historical forgery techniques (artificial aging, style transfer perturbation, material anachronism) at multiple severity levels.
✦ Quick Start
No installation required? Try ArtSleuth live on HuggingFace Spaces. Pretrained weights are on the HuggingFace Hub.
pip install artsleuth
Python
import artsleuth
result = artsleuth.analyze("judith_slaying_holofernes.jpg")
print(result.summary())
explanation = result.explain()
explanation.save("analysis_overlay.png")
CLI
artsleuth analyze painting.jpg
artsleuth style painting.jpg --top-k 5
artsleuth compare painting_a.jpg painting_b.jpg
artsleuth workshop painting.jpg
artsleuth robustness painting.jpg -r "Artemisia Gentileschi"
artsleuth benchmark --backbone dinov2 --backbone fusion
artsleuth demo
✦ Architecture
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1A2E48', 'primaryTextColor': '#F0F0F0', 'primaryBorderColor': '#9DC0D8', 'lineColor': '#9DC0D8', 'secondaryColor': '#1A2E48', 'tertiaryColor': '#1A2E48', 'edgeLabelBackground': '#0D1117', 'clusterBkg': '#0D1117', 'clusterBorder': '#9DC0D8', 'titleColor': '#9DC0D8'}}}%%
graph TD
%% Temporal drift is optional (off by default; needs user TemporalRegistry data)
Input["Artwork Image"] --> Resize["Resize · Crop · Normalise"]
Resize --> Patches["Patch Extraction"]
Resize --> FullImage["Full-Image Encoding"]
Patches --> DINO["DINOv2"]
FullImage --> CLIPEnc["CLIP"]
DINO --> Brushstroke["Brushstroke Analysis"]
DINO --> WorkshopNode["Workshop Decomposition"]
CLIPEnc --> Style["Style Classification"]
DINO --> Concat["Feature Concatenation"]
CLIPEnc --> Concat
Concat --> Attribution["Attribution Scoring"]
Attribution -.->|optional| Temporal["Temporal Drift Model"]
Concat --> Forgery["Forgery Detection"]
Concat -.->|optional| Adversarial["Adversarial Robustness"]
DINO --> Explain["Saliency Maps"]
Forgery --> Report["Analysis Report"]
Attribution --> Report
Temporal -.-> Report
Style --> Report
Brushstroke --> Report
WorkshopNode --> Report
Explain --> Report
Report --> WebUI["Web UI"]
Report --> CLI["CLI"]
Report --> MCP["MCP Server"]
The inference pipeline concatenates CLIP + DINOv2 embeddings. Cross-attention fusion is available as a training-time architecture (used in the benchmark fine-tuning) but is not part of the default inference path.
| Backbone | Strength | Used For |
|---|---|---|
| DINOv2 | Fine-grained texture and structure | Brushstroke analysis · patch embeddings |
| CLIP | Semantic-stylistic understanding | Style classification · style embeddings |
| Concat | Complementary feature combination | Attribution · forgery detection |
Default backbone sizes: DINOv2 ViT-B/14 + CLIP ViT-L/14. First run downloads ~1–2 GB of pretrained weights from HuggingFace Hub.
✦ Benchmark
Linear probe and end-to-end evaluation on the full WikiArt dataset (81 444 images, 80/20 split, seed 42):
| Backbone | Style Acc | Style F1 | Artist Acc | Artist Top-5 | Genre Acc |
|---|---|---|---|---|---|
| DINOv2 · ViT-B/14 | 57.5 % | 0.553 | 64.7 % | 90.9 % | 71.0 % |
| CLIP · ViT-L/14 | 67.1 % | 0.656 | 74.6 % | 95.9 % | 75.0 % |
| Fusion · frozen | 65.0 % | 0.633 | 71.0 % | 94.2 % | 74.2 % |
| Fusion · fine-tuned † | 71.6 % | 0.703 | 77.8 % | 96.2 % | 75.1 % |
| Fusion · e2e † | 72.7 % | — | 79.0 % | 96.9 % | 76.6 % |
Top three rows: logistic-regression linear probes on frozen backbones (macro-averaged across 27 styles, 129 artists, 11 genres). Reproducible via benchmarks/wikiart.py. † Bottom two rows: separate training runs with partial backbone unfreezing (last 3 transformer blocks), multi-task CE + supervised contrastive loss, AdamW, mixed-precision (5 epochs, effective batch 64). Training code not included in this repository; these numbers are reported for context.
Reproduce the frozen linear-probe benchmarks:
pip install artsleuth[benchmarks]
artsleuth benchmark --device cuda
Forgery detection validation (one-class authentication)
For each of 125 named artists (≥ 80 works, excluding the catch-all "Unknown Artist" category), we fit a Mahalanobis-distance reference model from 80 % of their authenticated WikiArt works, then test whether held-out genuine paintings score lower (closer to the reference distribution) than impostor paintings by other artists. ROC-AUC = 1.0 means perfect separation; 0.5 means chance.
| DINOv2 ViT-B/14 | CLIP ViT-L/14 | Fused (concat) | |
|---|---|---|---|
| Mean AUC | 0.873 | 0.958 | 0.897 |
| Median AUC | 0.895 | 0.970 | 0.918 |
| AUC ≥ 0.95 | 28 / 125 artists | 81 / 125 artists | 36 / 125 artists |
| AUC ≥ 0.90 | 62 / 125 artists | 113 / 125 artists | 75 / 125 artists |
Top 15 and bottom 5 by fused AUC:
| Artist | Works | DINOv2 | CLIP | Fused |
|---|---|---|---|---|
| Sam Francis | 317 | 1.000 | 1.000 | 1.000 |
| Antoine Blanchard | 170 | 1.000 | 1.000 | 1.000 |
| Gene Davis | 155 | 1.000 | 1.000 | 1.000 |
| Fra Angelico | 167 | 0.995 | 0.990 | 0.998 |
| Juan Gris | 196 | 0.993 | 0.996 | 0.998 |
| Frans Hals | 176 | 0.992 | 0.999 | 0.997 |
| Édouard Cortès | 214 | 0.992 | 1.000 | 0.995 |
| El Greco | 159 | 0.981 | 1.000 | 0.993 |
| Fernand Léger | 223 | 0.987 | 0.976 | 0.993 |
| Anthony van Dyck | 163 | 0.985 | 1.000 | 0.992 |
| Maxime Maufra | 119 | 0.991 | 0.998 | 0.991 |
| Joshua Reynolds | 200 | 0.976 | 0.999 | 0.989 |
| Henri Fantin-Latour | 105 | 0.977 | 1.000 | 0.989 |
| Ivan Aivazovsky | 577 | 0.980 | 0.998 | 0.986 |
| Gustave Moreau | 83 | 0.965 | 1.000 | 0.983 |
| … | ||||
| Salvador Dalí | 479 | 0.675 | 0.876 | 0.725 |
| Vasily Polenov | 225 | 0.684 | 0.926 | 0.723 |
| Jacek Malczewski | 91 | 0.659 | 0.961 | 0.709 |
| Mikhail Vrubel | 95 | 0.618 | 0.823 | 0.654 |
| M. C. Escher | 126 | 0.610 | 0.899 | 0.649 |
Mahalanobis-distance one-class classification on WikiArt (125 named artists, 80/20 split, equal genuine/impostor test sets, seed 42). Artists with distinctive visual signatures (El Greco, Fra Angelico, ukiyo-e prints) approach perfect separation; stylistically versatile artists (Dalí, Escher) are harder to model as a single distribution. Full per-artist results in artsleuth/benchmarks/forgery_validation_results.json.
✦ Related Work & Honest Limitations
Automated art classification has a rich history, and ArtSleuth builds on the shoulders of work we want to acknowledge properly.
Prior art in style classification. Saleh & Elgammal (2016) were among the first to apply metric learning to large-scale art datasets. Tan et al. (2016) trained a ResNet-50 on WikiArt and reported ~54 % style accuracy; their subsequent ArtGAN work (Tan et al., 2018) improved this to ~58 % by leveraging generative training. Chu & Wu (2018) showed that Gram-matrix representations of neural style features could reach ~63 %. More recently, multi-phase patch-based strategies (Bani & Abu-Naser, 2023) have reported high accuracy, though typically on reduced class sets or with micro-averaged metrics that weight common styles more heavily.
Backbone representations. Our fusion approach is motivated by the observation — articulated clearly in recent work on style disentanglement (Jia et al., 2026) — that self-supervised models like DINOv2 (Oquab et al., 2024) and vision-language models like CLIP (Radford et al., 2021) encode fundamentally different aspects of visual style. DINOv2 captures texture and structure; CLIP captures semantic-categorical associations. Cross-attention lets each backbone inform the other, but we should note that this idea is closely related to multi-modal fusion strategies explored in VQA and image-text retrieval.
Workshop attribution. Computational connoisseurship traces back to Lyu et al. (2004), who applied wavelet statistics to distinguish Bruegel from his imitators, and to Johnson et al. (2008), who used canvas-thread analysis for Vermeer attribution. Our Dirichlet-process approach to workshop decomposition is more flexible than these hand-crafted pipelines but has not yet been validated on the expert-curated datasets those studies used.
| Method | Style Acc | Artist Acc | Classes | Protocol |
|---|---|---|---|---|
| ResNet-50 (Tan et al., 2016) | 54.5 % | 56.5 % | 27 / 23 | WikiArt subset, weighted avg |
| ArtGAN (Tan et al., 2018) | 58.0 % | — | 27 | WikiArt, GAN-augmented |
| Gram matrices (Chu & Wu, 2018) | 63.0 % | — | 27 | WikiArt, micro avg |
| Deep ensemble (Manzoor et al., 2024) | 68.6 % | — | 27 | WikiArt, stacking ensemble |
| ArtFusionNet (Kose & Guner, 2025) | 99.0 % | — | 3 | WikiArt subset, 3 styles only |
| ArtSleuth Fusion · e2e | 72.7 % | 79.0 % | 27 / 129 | WikiArt full, 81k, macro avg |
Numbers are taken from the respective publications. Direct comparison is difficult: studies differ in the number of classes, dataset splits, averaging methods (micro vs. macro), and whether test sets overlap with training data. We list the protocol details we could verify so readers can judge for themselves.
Where we fall short — and we know it.
-
Compute-constrained training. Fine-tuning ran for 5 epochs on a single GPU with 16 GB VRAM. More epochs, larger effective batches, or higher-VRAM GPUs (A100, H100) would very likely improve the numbers. We chose to report what we could reproduce on accessible hardware rather than extrapolate.
-
Frozen-fusion underperformance. Our frozen cross-attention fusion (65.0 % style) actually trails bare CLIP (67.1 %). The fusion head needs gradient signal from task labels to learn a useful alignment — it does not help out of the box. We report this rather than hide it.
-
No standardised benchmark protocol. WikiArt classification has no single accepted evaluation protocol. Class counts, splits, and averaging methods vary between papers, which makes apples-to-apples comparison frustratingly difficult. Our numbers use macro-averaging, which is the most conservative choice (each of the 27 styles counts equally, regardless of how many images it contains). Papers that report micro-averaged or weighted scores will appear higher on the same data.
-
Forgery detection validated on embeddings, not on physical forgeries. We validated the one-class anomaly detector (Mahalanobis distance) on WikiArt across 125 named artists with ≥ 80 works. Mean ROC-AUC: 0.958 (CLIP), 0.897 (fused DINOv2 + CLIP), 0.873 (DINOv2 alone). Median fused AUC is 0.918; three artists reach perfect 1.000. Full per-artist results are in
artsleuth/benchmarks/forgery_validation_results.json. However, this evaluates embedding-space separation between different artists — it does not test against actual physical forgeries authenticated by conservators, which is a harder and more practically relevant problem. -
Workshop decomposition is unsupervised. The Dirichlet-process model infers "hands" from embedding clusters, but there is no ground-truth labelled dataset of workshop paintings with per-region hand annotations to validate against. Art-historical validation by domain experts is still needed.
-
School predictions are randomly initialised. Pretrained weights ship for period (27 WikiArt styles) and genre (11 WikiArt genres), but the school axis has no labelled training data yet. School predictions are therefore based on randomly initialised weights and should not be trusted until fine-tuned on an appropriate corpus.
-
Temporal drift requires dated references. The Gaussian-process date estimator only works for artists whose dated reference embeddings are in the registry. No bundled references are shipped; temporal estimation is disabled by default and has no effect until the user populates a
TemporalRegistry.
We consider these open problems, not failures. Contributions that address any of them — especially expert-curated evaluation datasets — would strengthen the project considerably.
Full reference list
- Bani, M. & Abu-Naser, S. S. (2023). Artistic style recognition: combining deep and shallow neural networks for painting classification. Mathematics, 11(22), 4564. doi:10.3390/math11224564
- Berenson, B. (1902). The Study and Criticism of Italian Art. George Bell & Sons.
- Blei, D. M. & Jordan, M. I. (2006). Variational inference for Dirichlet process mixtures. Bayesian Analysis, 1(1), 121–143. doi:10.1214/06-BA104
- Caron, M. et al. (2021). Emerging properties in self-supervised vision transformers. ICCV. arXiv:2104.14294
- Chu, W.-T. & Wu, Y.-L. (2018). Image style classification based on learnt deep correlation features. IEEE Trans. Multimedia, 20(9), 2491–2502. doi:10.1109/TMM.2018.2801718
- Jia, Z., Zhang, J. & Zhou, J. (2026). StyleDecoupler: generalizable artistic style disentanglement. arXiv:2601.17697
- Johnson, C. R. et al. (2008). Image processing for artist identification. IEEE Signal Processing Magazine, 25(4), 37–48. doi:10.1109/MSP.2008.923513
- Jose, J. et al. (2025). DINOv2 meets text: a unified framework for image- and pixel-level vision-language alignment. CVPR. arXiv:2501.00564
- Kose, U. & Guner, B. (2025). Enhancing artistic style classification through a novel ArtFusionNet framework. Scientific Reports, 15, 20087. doi:10.1038/s41598-025-04825-y (Note: evaluated on 3 style classes.)
- Lyu, S., Rockmore, D. & Farid, H. (2004). A digital technique for art authentication. PNAS, 101(49), 17006–17010. doi:10.1073/pnas.0406398101
- Manzoor, T. et al. (2024). Deep ensemble art style recognition. arXiv:2405.11675
- Morelli, G. (1890). Italian Painters: Critical Studies of Their Works. John Murray.
- Oquab, M. et al. (2024). DINOv2: Learning robust visual features without supervision. TMLR. arXiv:2304.07193
- Radford, A. et al. (2021). Learning transferable visual models from natural language supervision. ICML. arXiv:2103.00020
- Rasmussen, C. E. & Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press.
- Saleh, B. & Elgammal, A. (2016). Large-scale classification of fine-art paintings. JOCCH, 8(4), 1–24. doi:10.1145/2801634
- Selvaraju, R. R. et al. (2017). Grad-CAM: visual explanations from deep networks via gradient-based localization. ICCV. doi:10.1109/ICCV.2017.74
- Tan, W. R. et al. (2016). Ceci n'est pas une pipe: a deep convolutional network for fine-art paintings classification. ICIP. doi:10.1109/ICIP.2016.7533051
- Tan, W. R. et al. (2018). ArtGAN: artwork synthesis with conditional categorical GANs. IEEE Trans. Image Processing, 27(10), 4846–4860. doi:10.1109/TIP.2018.2845388
- Vaswani, A. et al. (2017). Attention is all you need. NeurIPS. arXiv:1706.03762
✦ Web Demo
An interactive Gradio interface with five analysis tabs: full pipeline, side-by-side comparison, workshop decomposition, temporal dating, and benchmark dashboard.
pip install artsleuth[web]
artsleuth demo
Or try the live demo on HuggingFace Spaces — no installation required. The GitHub Pages landing page provides an immersive overview of the framework.
✦ MCP Server
ArtSleuth ships as an MCP server, enabling AI assistants to perform art analysis conversationally.
artsleuth server
| Tool | Description |
|---|---|
analyze_artwork |
Full analysis pipeline |
classify_style |
Period, school, genre classification |
compare_works |
Side-by-side stylistic comparison |
detect_anomalies |
Forgery screening against a reference corpus |
Claude Desktop configuration
{
"mcpServers": {
"artsleuth": {
"command": "artsleuth",
"args": ["server"]
}
}
}
✦ Repository Structure
ArtSleuth/
├── artsleuth/
│ ├── core/ # Analysis engines
│ │ ├── brushstroke.py # Brushstroke pattern extraction
│ │ ├── style.py # Style classification
│ │ ├── attribution.py # Artist attribution scoring
│ │ ├── forgery.py # Anomaly-based forgery detection
│ │ ├── explainability.py # Gradient saliency overlays
│ │ ├── temporal.py # Temporal style drift (GP)
│ │ ├── workshop.py # Bayesian workshop decomposition
│ │ ├── adversarial.py # Adversarial robustness testing
│ │ └── pipeline.py # Unified analysis orchestrator
│ ├── models/ # Backbone & head architectures
│ │ ├── backbones.py # DINOv2 & CLIP loaders
│ │ ├── fusion.py # Cross-attention backbone fusion
│ │ ├── heads.py # Task-specific linear heads
│ │ └── registry.py # HuggingFace model registry
│ ├── preprocessing/ # Art-specific transforms
│ │ ├── transforms.py # Varnish, crack, canvas correction
│ │ └── patches.py # Grid, salient, adaptive extraction
│ ├── benchmarks/ # Evaluation suite
│ │ ├── wikiart.py # WikiArt dataset + linear probes
│ │ └── evaluate.py # Multi-backbone comparison runner
│ ├── mcp/ # MCP server
│ │ └── server.py # Tool definitions & handlers
│ ├── cli/ # Command-line interface
│ │ └── main.py # Click-based CLI
│ └── utils/ # Shared utilities
│ ├── visualization.py # Publication-quality figures
│ └── io.py # Image loading & saving
├── web/ # Gradio web demo
│ ├── app.py # Main application (5 tabs)
│ ├── theme.py # Custom ArtSleuth theme
│ └── components.py # Reusable UI builders
├── tests/ # Pytest suite (9 test modules)
├── examples/ # Jupyter notebooks
├── docs/ # Methodology & guides
├── assets/ # Visual assets
└── index.html # GitHub Pages landing site
✦ Development
git clone https://github.com/ladyFaye1998/ArtSleuth.git
cd ArtSleuth
pip install -e ".[all]"
pytest
ruff check .
mypy artsleuth
✦ Methodology
ArtSleuth draws on two traditions:
Art history — Giovanni Morelli's observation (1890) that an artist's most characteristic habits reside in the least-conscious passages. Bernard Berenson's refinement of this into systematic connoisseurship. The workshop-attribution methodology developed for the Gentileschi debate, where master and assistants each contribute recognisable passages to a shared canvas.
Computer science — Self-supervised vision transformers (Caron et al., 2021; Oquab et al., 2024) that learn rich visual features without task-specific labels. Contrastive vision-language models (Radford et al., 2021) that ground visual concepts in linguistic semantics. Cross-attention fusion (Vaswani et al., 2017; Jose et al., 2025) for multi-modal feature alignment. Dirichlet process mixtures (Blei & Jordan, 2006) for non-parametric clustering. Gaussian processes (Rasmussen & Williams, 2006) for temporal modelling.
The two complement each other: art history provides the questions; machine learning provides a scale of analysis that would be impractical by eye alone.
See docs/methodology.md for the full technical discussion.
✦ Citation
@software{lesin2026artsleuth,
author = {Lesin, Danielle},
title = {{ArtSleuth}: Computational Art Analysis Framework},
year = {2026},
url = {https://github.com/ladyFaye1998/ArtSleuth},
license = {MIT}
}
✦ Contributing
Contributions are welcome from art historians, ML researchers, conservators, and anyone interested in computational approaches to cultural heritage.
| Area | What's Needed |
|---|---|
| Reference corpora | Curated, well-attributed image sets for specific artists or periods |
| Temporal references | Dated works for training the temporal style drift model |
| Model improvements | Better backbones, training strategies, evaluation benchmarks |
| Art-historical review | Ensuring taxonomy, terminology, and methodology stay sound |
| Web UI | Gradio component improvements, accessibility, visualisation refinements |
| Bug reports | Open an issue with reproduction steps |
See CONTRIBUTING.md for guidelines.
Built with 🫖 by Danielle Lesin · Where connoisseurship meets computation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file artsleuth-0.2.1.tar.gz.
File metadata
- Download URL: artsleuth-0.2.1.tar.gz
- Upload date:
- Size: 44.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
99e4bdbe47a9bb71fc092e070ac4708117372483b1161509c6f4a70b6e857f40
|
|
| MD5 |
a8e67c9102b3809d79ff04e7fe969256
|
|
| BLAKE2b-256 |
a3766e8eedfe9ae30a81db0d9149356c0b243a770db07171c55cb6b6d6d51688
|
File details
Details for the file artsleuth-0.2.1-py3-none-any.whl.
File metadata
- Download URL: artsleuth-0.2.1-py3-none-any.whl
- Upload date:
- Size: 200.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ab2c8af8d2b5681deda59710d8eab1a1f76c931df29e7ad379417a0d1d4c38c
|
|
| MD5 |
636b8b75b07c08879fe853a76d16a099
|
|
| BLAKE2b-256 |
caafb9bd1760e15c9d57e873a626703153b0a6ec22f1288ef49b1c278c0b3190
|