Skip to main content

Computational Art Analysis Framework — Brushstroke analysis, style attribution, anomaly screening, and interpretable visual explanations powered by vision transformers.

Project description


ArtSleuth Banner



Python  PyTorch  HuggingFace  Demo  MCP  License


Typing SVG




✦ About

ArtSleuth is a computational art-analysis framework that formalises what connoisseurs have done for centuries — examining the physical evidence a painter leaves on a canvas — using machine learning.

Brushstroke directionality, impasto relief, palette temperature, the habitual gestures that reside in the least-scrutinised passages of a painting — drapery folds, background foliage, the rendering of earlobes. These are the signals that distinguish one hand from another, and they map naturally onto what self-supervised vision transformers learn to encode.


Capability Method
Brushstroke Stroke orientation, coherence, energy, curvature with patch-level clustering Structure tensor on image gradients + DINOv2 patch embeddings
Style Period, school, and genre prediction CLIP embeddings through learned linear heads (period and genre pretrained; school randomly initialised)
Attribution Embedding-space comparison with temporal plausibility scoring Cosine similarity with GP-based date estimation
Workshop Bayesian inference of distinct hands in collaborative paintings Dirichlet process Gaussian mixture model
Forgery One-class anomaly scoring with adversarial robustness testing Mahalanobis distance plus historical forgery simulation
Fusion Complementary features from two vision transformers Concatenation at inference; cross-attention available for training
Temporal Models how an artist's style evolves over decades Gaussian process regression in embedding space (requires user-supplied dated references)
Explainability Visual heatmaps highlighting regions the model considers salient Gradient-based saliency maps



✦ What's Novel

ArtSleuth combines several techniques that are typically studied in isolation:

  1. Style-Guided Cross-Attention Fusion — CLIP's semantic understanding directs DINOv2's patch-level attention via multi-head cross-attention with learned temperature, producing fused features neither backbone achieves alone.

  2. Temporal Style Drift Modelling — Gaussian process regression over time-stamped reference embeddings captures how an artist's hand evolves across decades, reporting temporal plausibility as a separate signal. Requires user-supplied dated references; no bundled data is shipped.

  3. Hierarchical Workshop Decomposition — A Dirichlet process Gaussian mixture model automatically infers the number of distinct hands in a painting, replacing flat k-means with art-historically grounded probabilistic clustering.

  4. Adversarial Forgery Robustness — Stress-tests detection against simulated historical forgery techniques (artificial aging, style transfer perturbation, material anachronism) at multiple severity levels.




✦ Quick Start

No installation required?Try ArtSleuth live on HuggingFace Spaces.   Pretrained weights are on the HuggingFace Hub.

pip install artsleuth

Python

import artsleuth

result = artsleuth.analyze("judith_slaying_holofernes.jpg")
print(result.summary())

explanation = result.explain()
explanation.save("analysis_overlay.png")

CLI

artsleuth analyze painting.jpg
artsleuth style painting.jpg --top-k 5
artsleuth compare painting_a.jpg painting_b.jpg
artsleuth workshop painting.jpg
artsleuth robustness painting.jpg -r "Artemisia Gentileschi"
artsleuth benchmark --backbone dinov2 --backbone fusion
artsleuth demo



✦ Architecture

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#1A2E48', 'primaryTextColor': '#F0F0F0', 'primaryBorderColor': '#9DC0D8', 'lineColor': '#9DC0D8', 'secondaryColor': '#1A2E48', 'tertiaryColor': '#1A2E48', 'edgeLabelBackground': '#0D1117', 'clusterBkg': '#0D1117', 'clusterBorder': '#9DC0D8', 'titleColor': '#9DC0D8'}}}%%

graph TD
    %% Temporal drift is optional (off by default; needs user TemporalRegistry data)
    Input["Artwork Image"] --> Resize["Resize · Crop · Normalise"]

    Resize --> Patches["Patch Extraction"]
    Resize --> FullImage["Full-Image Encoding"]

    Patches --> DINO["DINOv2"]
    FullImage --> CLIPEnc["CLIP"]

    DINO --> Brushstroke["Brushstroke Analysis"]
    DINO --> WorkshopNode["Workshop Decomposition"]
    CLIPEnc --> Style["Style Classification"]

    DINO --> Concat["Feature Concatenation"]
    CLIPEnc --> Concat

    Concat --> Attribution["Attribution Scoring"]
    Attribution -.->|optional| Temporal["Temporal Drift Model"]

    Concat --> Forgery["Forgery Detection"]
    Concat -.->|optional| Adversarial["Adversarial Robustness"]

    DINO --> Explain["Saliency Maps"]

    Forgery --> Report["Analysis Report"]
    Attribution --> Report
    Temporal -.-> Report
    Style --> Report
    Brushstroke --> Report
    WorkshopNode --> Report
    Explain --> Report

    Report --> WebUI["Web UI"]
    Report --> CLI["CLI"]
    Report --> MCP["MCP Server"]

The inference pipeline concatenates CLIP + DINOv2 embeddings. Cross-attention fusion is available as a training-time architecture (used in the benchmark fine-tuning) but is not part of the default inference path.


Backbone Strength Used For
DINOv2 Fine-grained texture and structure Brushstroke analysis · patch embeddings
CLIP Semantic-stylistic understanding Style classification · style embeddings
Concat Complementary feature combination Attribution · forgery detection

Default backbone sizes: DINOv2 ViT-B/14 + CLIP ViT-L/14. First run downloads ~1–2 GB of pretrained weights from HuggingFace Hub.




✦ Benchmark

Linear probe and end-to-end evaluation on the full WikiArt dataset (81 444 images, 80/20 split, seed 42):

Backbone Style Acc Style F1 Artist Acc Artist Top-5 Genre Acc
DINOv2 · ViT-B/14 57.5 % 0.553 64.7 % 90.9 % 71.0 %
CLIP · ViT-L/14 67.1 % 0.656 74.6 % 95.9 % 75.0 %
Fusion · frozen 65.0 % 0.633 71.0 % 94.2 % 74.2 %
Fusion · fine-tuned † 71.6 % 0.703 77.8 % 96.2 % 75.1 %
Fusion · e2e † 72.7 % 79.0 % 96.9 % 76.6 %

Top three rows: logistic-regression linear probes on frozen backbones (macro-averaged across 27 styles, 129 artists, 11 genres). Reproducible via benchmarks/wikiart.py. † Bottom two rows: separate training runs with partial backbone unfreezing (last 3 transformer blocks), multi-task CE + supervised contrastive loss, AdamW, mixed-precision (5 epochs, effective batch 64). Training code not included in this repository; these numbers are reported for context.


Reproduce the frozen linear-probe benchmarks:

pip install artsleuth[benchmarks]
artsleuth benchmark --device cuda

 Forgery detection validation (one-class authentication)

For each of 125 named artists (≥ 80 works, excluding the catch-all "Unknown Artist" category), we fit a Mahalanobis-distance reference model from 80 % of their authenticated WikiArt works, then test whether held-out genuine paintings score lower (closer to the reference distribution) than impostor paintings by other artists. ROC-AUC = 1.0 means perfect separation; 0.5 means chance.

DINOv2 ViT-B/14 CLIP ViT-L/14 Fused (concat)
Mean AUC 0.873 0.958 0.897
Median AUC 0.895 0.970 0.918
AUC ≥ 0.95 28 / 125 artists 81 / 125 artists 36 / 125 artists
AUC ≥ 0.90 62 / 125 artists 113 / 125 artists 75 / 125 artists

Top 15 and bottom 5 by fused AUC:

Artist Works DINOv2 CLIP Fused
Sam Francis 317 1.000 1.000 1.000
Antoine Blanchard 170 1.000 1.000 1.000
Gene Davis 155 1.000 1.000 1.000
Fra Angelico 167 0.995 0.990 0.998
Juan Gris 196 0.993 0.996 0.998
Frans Hals 176 0.992 0.999 0.997
Édouard Cortès 214 0.992 1.000 0.995
El Greco 159 0.981 1.000 0.993
Fernand Léger 223 0.987 0.976 0.993
Anthony van Dyck 163 0.985 1.000 0.992
Maxime Maufra 119 0.991 0.998 0.991
Joshua Reynolds 200 0.976 0.999 0.989
Henri Fantin-Latour 105 0.977 1.000 0.989
Ivan Aivazovsky 577 0.980 0.998 0.986
Gustave Moreau 83 0.965 1.000 0.983
Salvador Dalí 479 0.675 0.876 0.725
Vasily Polenov 225 0.684 0.926 0.723
Jacek Malczewski 91 0.659 0.961 0.709
Mikhail Vrubel 95 0.618 0.823 0.654
M. C. Escher 126 0.610 0.899 0.649

Mahalanobis-distance one-class classification on WikiArt (125 named artists, 80/20 split, equal genuine/impostor test sets, seed 42). Artists with distinctive visual signatures (El Greco, Fra Angelico, ukiyo-e prints) approach perfect separation; stylistically versatile artists (Dalí, Escher) are harder to model as a single distribution. Full per-artist results in artsleuth/benchmarks/forgery_validation_results.json.




✦ Related Work & Honest Limitations

Automated art classification has a rich history, and ArtSleuth builds on the shoulders of work we want to acknowledge properly.


Prior art in style classification.  Saleh & Elgammal (2016) were among the first to apply metric learning to large-scale art datasets. Tan et al. (2016) trained a ResNet-50 on WikiArt and reported ~54 % style accuracy; their subsequent ArtGAN work (Tan et al., 2018) improved this to ~58 % by leveraging generative training. Chu & Wu (2018) showed that Gram-matrix representations of neural style features could reach ~63 %. More recently, multi-phase patch-based strategies (Bani & Abu-Naser, 2023) have reported high accuracy, though typically on reduced class sets or with micro-averaged metrics that weight common styles more heavily.

Backbone representations.  Our fusion approach is motivated by the observation — articulated clearly in recent work on style disentanglement (Jia et al., 2026) — that self-supervised models like DINOv2 (Oquab et al., 2024) and vision-language models like CLIP (Radford et al., 2021) encode fundamentally different aspects of visual style. DINOv2 captures texture and structure; CLIP captures semantic-categorical associations. Cross-attention lets each backbone inform the other, but we should note that this idea is closely related to multi-modal fusion strategies explored in VQA and image-text retrieval.

Workshop attribution.  Computational connoisseurship traces back to Lyu et al. (2004), who applied wavelet statistics to distinguish Bruegel from his imitators, and to Johnson et al. (2008), who used canvas-thread analysis for Vermeer attribution. Our Dirichlet-process approach to workshop decomposition is more flexible than these hand-crafted pipelines but has not yet been validated on the expert-curated datasets those studies used.


Method Style Acc Artist Acc Classes Protocol
ResNet-50 (Tan et al., 2016) 54.5 % 56.5 % 27 / 23 WikiArt subset, weighted avg
ArtGAN (Tan et al., 2018) 58.0 % 27 WikiArt, GAN-augmented
Gram matrices (Chu & Wu, 2018) 63.0 % 27 WikiArt, micro avg
Deep ensemble (Manzoor et al., 2024) 68.6 % 27 WikiArt, stacking ensemble
ArtFusionNet (Kose & Guner, 2025) 99.0 % 3 WikiArt subset, 3 styles only
ArtSleuth Fusion · e2e 72.7 % 79.0 % 27 / 129 WikiArt full, 81k, macro avg

Numbers are taken from the respective publications. Direct comparison is difficult: studies differ in the number of classes, dataset splits, averaging methods (micro vs. macro), and whether test sets overlap with training data. We list the protocol details we could verify so readers can judge for themselves.


Where we fall short — and we know it.

  • Compute-constrained training.  Fine-tuning ran for 5 epochs on a single GPU with 16 GB VRAM. More epochs, larger effective batches, or higher-VRAM GPUs (A100, H100) would very likely improve the numbers. We chose to report what we could reproduce on accessible hardware rather than extrapolate.

  • Frozen-fusion underperformance.  Our frozen cross-attention fusion (65.0 % style) actually trails bare CLIP (67.1 %). The fusion head needs gradient signal from task labels to learn a useful alignment — it does not help out of the box. We report this rather than hide it.

  • No standardised benchmark protocol.  WikiArt classification has no single accepted evaluation protocol. Class counts, splits, and averaging methods vary between papers, which makes apples-to-apples comparison frustratingly difficult. Our numbers use macro-averaging, which is the most conservative choice (each of the 27 styles counts equally, regardless of how many images it contains). Papers that report micro-averaged or weighted scores will appear higher on the same data.

  • Forgery detection validated on embeddings, not on physical forgeries.  We validated the one-class anomaly detector (Mahalanobis distance) on WikiArt across 125 named artists with ≥ 80 works. Mean ROC-AUC: 0.958 (CLIP), 0.897 (fused DINOv2 + CLIP), 0.873 (DINOv2 alone). Median fused AUC is 0.918; three artists reach perfect 1.000. Full per-artist results are in artsleuth/benchmarks/forgery_validation_results.json. However, this evaluates embedding-space separation between different artists — it does not test against actual physical forgeries authenticated by conservators, which is a harder and more practically relevant problem.

  • Workshop decomposition is unsupervised.  The Dirichlet-process model infers "hands" from embedding clusters, but there is no ground-truth labelled dataset of workshop paintings with per-region hand annotations to validate against. Art-historical validation by domain experts is still needed.

  • School predictions are randomly initialised.  Pretrained weights ship for period (27 WikiArt styles) and genre (11 WikiArt genres), but the school axis has no labelled training data yet. School predictions are therefore based on randomly initialised weights and should not be trusted until fine-tuned on an appropriate corpus.

  • Temporal drift requires dated references.  The Gaussian-process date estimator only works for artists whose dated reference embeddings are in the registry. No bundled references are shipped; temporal estimation is disabled by default and has no effect until the user populates a TemporalRegistry.

We consider these open problems, not failures. Contributions that address any of them — especially expert-curated evaluation datasets — would strengthen the project considerably.


 Full reference list
  • Bani, M. & Abu-Naser, S. S. (2023). Artistic style recognition: combining deep and shallow neural networks for painting classification. Mathematics, 11(22), 4564. doi:10.3390/math11224564
  • Berenson, B. (1902). The Study and Criticism of Italian Art. George Bell & Sons.
  • Blei, D. M. & Jordan, M. I. (2006). Variational inference for Dirichlet process mixtures. Bayesian Analysis, 1(1), 121–143. doi:10.1214/06-BA104
  • Caron, M. et al. (2021). Emerging properties in self-supervised vision transformers. ICCV. arXiv:2104.14294
  • Chu, W.-T. & Wu, Y.-L. (2018). Image style classification based on learnt deep correlation features. IEEE Trans. Multimedia, 20(9), 2491–2502. doi:10.1109/TMM.2018.2801718
  • Jia, Z., Zhang, J. & Zhou, J. (2026). StyleDecoupler: generalizable artistic style disentanglement. arXiv:2601.17697
  • Johnson, C. R. et al. (2008). Image processing for artist identification. IEEE Signal Processing Magazine, 25(4), 37–48. doi:10.1109/MSP.2008.923513
  • Jose, J. et al. (2025). DINOv2 meets text: a unified framework for image- and pixel-level vision-language alignment. CVPR. arXiv:2501.00564
  • Kose, U. & Guner, B. (2025). Enhancing artistic style classification through a novel ArtFusionNet framework. Scientific Reports, 15, 20087. doi:10.1038/s41598-025-04825-y (Note: evaluated on 3 style classes.)
  • Lyu, S., Rockmore, D. & Farid, H. (2004). A digital technique for art authentication. PNAS, 101(49), 17006–17010. doi:10.1073/pnas.0406398101
  • Manzoor, T. et al. (2024). Deep ensemble art style recognition. arXiv:2405.11675
  • Morelli, G. (1890). Italian Painters: Critical Studies of Their Works. John Murray.
  • Oquab, M. et al. (2024). DINOv2: Learning robust visual features without supervision. TMLR. arXiv:2304.07193
  • Radford, A. et al. (2021). Learning transferable visual models from natural language supervision. ICML. arXiv:2103.00020
  • Rasmussen, C. E. & Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press.
  • Saleh, B. & Elgammal, A. (2016). Large-scale classification of fine-art paintings. JOCCH, 8(4), 1–24. doi:10.1145/2801634
  • Selvaraju, R. R. et al. (2017). Grad-CAM: visual explanations from deep networks via gradient-based localization. ICCV. doi:10.1109/ICCV.2017.74
  • Tan, W. R. et al. (2016). Ceci n'est pas une pipe: a deep convolutional network for fine-art paintings classification. ICIP. doi:10.1109/ICIP.2016.7533051
  • Tan, W. R. et al. (2018). ArtGAN: artwork synthesis with conditional categorical GANs. IEEE Trans. Image Processing, 27(10), 4846–4860. doi:10.1109/TIP.2018.2845388
  • Vaswani, A. et al. (2017). Attention is all you need. NeurIPS. arXiv:1706.03762



✦ Web Demo

An interactive Gradio interface with five analysis tabs: full pipeline, side-by-side comparison, workshop decomposition, temporal dating, and benchmark dashboard.

pip install artsleuth[web]
artsleuth demo

Or try the live demo on HuggingFace Spaces — no installation required. The GitHub Pages landing page provides an immersive overview of the framework.




✦ MCP Server

ArtSleuth ships as an MCP server, enabling AI assistants to perform art analysis conversationally.

artsleuth server

Tool Description
analyze_artwork Full analysis pipeline
classify_style Period, school, genre classification
compare_works Side-by-side stylistic comparison
detect_anomalies Forgery screening against a reference corpus

 Claude Desktop configuration
{
  "mcpServers": {
    "artsleuth": {
      "command": "artsleuth",
      "args": ["server"]
    }
  }
}



✦ Repository Structure

ArtSleuth/
├── artsleuth/
│   ├── core/                  # Analysis engines
│   │   ├── brushstroke.py     #   Brushstroke pattern extraction
│   │   ├── style.py           #   Style classification
│   │   ├── attribution.py     #   Artist attribution scoring
│   │   ├── forgery.py         #   Anomaly-based forgery detection
│   │   ├── explainability.py  #   Gradient saliency overlays
│   │   ├── temporal.py        #   Temporal style drift (GP)
│   │   ├── workshop.py        #   Bayesian workshop decomposition
│   │   ├── adversarial.py     #   Adversarial robustness testing
│   │   └── pipeline.py        #   Unified analysis orchestrator
│   ├── models/                # Backbone & head architectures
│   │   ├── backbones.py       #   DINOv2 & CLIP loaders
│   │   ├── fusion.py          #   Cross-attention backbone fusion
│   │   ├── heads.py           #   Task-specific linear heads
│   │   └── registry.py        #   HuggingFace model registry
│   ├── preprocessing/         # Art-specific transforms
│   │   ├── transforms.py      #   Varnish, crack, canvas correction
│   │   └── patches.py         #   Grid, salient, adaptive extraction
│   ├── benchmarks/            # Evaluation suite
│   │   ├── wikiart.py         #   WikiArt dataset + linear probes
│   │   └── evaluate.py        #   Multi-backbone comparison runner
│   ├── mcp/                   # MCP server
│   │   └── server.py          #   Tool definitions & handlers
│   ├── cli/                   # Command-line interface
│   │   └── main.py            #   Click-based CLI
│   └── utils/                 # Shared utilities
│       ├── visualization.py   #   Publication-quality figures
│       └── io.py              #   Image loading & saving
├── web/                       # Gradio web demo
│   ├── app.py                 #   Main application (5 tabs)
│   ├── theme.py               #   Custom ArtSleuth theme
│   └── components.py          #   Reusable UI builders
├── tests/                     # Pytest suite (9 test modules)
├── examples/                  # Jupyter notebooks
├── docs/                      # Methodology & guides
├── assets/                    # Visual assets
└── index.html                 # GitHub Pages landing site



✦ Development

git clone https://github.com/ladyFaye1998/ArtSleuth.git
cd ArtSleuth
pip install -e ".[all]"

pytest
ruff check .
mypy artsleuth



✦ Methodology

ArtSleuth draws on two traditions:

Art history — Giovanni Morelli's observation (1890) that an artist's most characteristic habits reside in the least-conscious passages. Bernard Berenson's refinement of this into systematic connoisseurship. The workshop-attribution methodology developed for the Gentileschi debate, where master and assistants each contribute recognisable passages to a shared canvas.

Computer science — Self-supervised vision transformers (Caron et al., 2021; Oquab et al., 2024) that learn rich visual features without task-specific labels. Contrastive vision-language models (Radford et al., 2021) that ground visual concepts in linguistic semantics. Cross-attention fusion (Vaswani et al., 2017; Jose et al., 2025) for multi-modal feature alignment. Dirichlet process mixtures (Blei & Jordan, 2006) for non-parametric clustering. Gaussian processes (Rasmussen & Williams, 2006) for temporal modelling.

The two complement each other: art history provides the questions; machine learning provides a scale of analysis that would be impractical by eye alone.

See docs/methodology.md for the full technical discussion.




✦ Citation

@software{lesin2026artsleuth,
  author    = {Lesin, Danielle},
  title     = {{ArtSleuth}: Computational Art Analysis Framework},
  year      = {2026},
  url       = {https://github.com/ladyFaye1998/ArtSleuth},
  license   = {MIT}
}



✦ Contributing

Contributions are welcome from art historians, ML researchers, conservators, and anyone interested in computational approaches to cultural heritage.


Area What's Needed
Reference corpora Curated, well-attributed image sets for specific artists or periods
Temporal references Dated works for training the temporal style drift model
Model improvements Better backbones, training strategies, evaluation benchmarks
Art-historical review Ensuring taxonomy, terminology, and methodology stay sound
Web UI Gradio component improvements, accessibility, visualisation refinements
Bug reports Open an issue with reproduction steps

See CONTRIBUTING.md for guidelines.




Built with 🫖 by Danielle Lesin · Where connoisseurship meets computation



Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

artsleuth-0.2.1.tar.gz (44.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

artsleuth-0.2.1-py3-none-any.whl (200.7 kB view details)

Uploaded Python 3

File details

Details for the file artsleuth-0.2.1.tar.gz.

File metadata

  • Download URL: artsleuth-0.2.1.tar.gz
  • Upload date:
  • Size: 44.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.3

File hashes

Hashes for artsleuth-0.2.1.tar.gz
Algorithm Hash digest
SHA256 99e4bdbe47a9bb71fc092e070ac4708117372483b1161509c6f4a70b6e857f40
MD5 a8e67c9102b3809d79ff04e7fe969256
BLAKE2b-256 a3766e8eedfe9ae30a81db0d9149356c0b243a770db07171c55cb6b6d6d51688

See more details on using hashes here.

File details

Details for the file artsleuth-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: artsleuth-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 200.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.3

File hashes

Hashes for artsleuth-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3ab2c8af8d2b5681deda59710d8eab1a1f76c931df29e7ad379417a0d1d4c38c
MD5 636b8b75b07c08879fe853a76d16a099
BLAKE2b-256 caafb9bd1760e15c9d57e873a626703153b0a6ec22f1288ef49b1c278c0b3190

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page