Skip to main content

Generate poetry and text from tree bark images using deterministic feature mapping

Project description

Barkprints 🌳

Generate text from tree bark images using embedding-based corpus matching.

Concept

Barkprints explores finding hidden language in tree bark patterns. Instead of attempting to recognize what's in an image, it treats the numerical pixel data like it's already a text embedding and finds the most similar actual text from a corpus.

Every bark image produces a deterministic voice. Each unique bark pattern maps to a consistent position in embedding space, which matches to specific sentences in a chosen corpus. The same image always finds the same text, as if each tree speaks through its texture.

How It Works

  1. Feature Extraction: Extracts 384-dimensional feature vector from bark image

    • Color histograms, texture gradients, spatial statistics
    • Frequency features via DCT coefficients
    • Normalized to embedding-like range [-1, 1]
  2. Corpus Loading: Loads pre-embedded text corpus

    • Sentences are embedded using sentence-transformers
    • Stored with their embeddings for fast matching
  3. Similarity Matching: Finds nearest text via cosine similarity

    • Treats image features as if they were text embeddings
    • Computes similarity with all corpus embeddings
    • Returns the closest match(es)
  4. Deterministic Output: Same image + corpus = same text, always

Installation

Using uv (recommended):

cd barkprints
uv sync

Quick Start

Generate text from a bark image:

barkprints barks.jpg -c nature

Try a different corpus:

barkprints barks.jpg -c literature

Get top 3 matches with similarity scores:

barkprints barks.jpg -c nature --top-k 3

Process multiple images:

barkprints tree1.jpg tree2.jpg tree3.jpg -c nature

Usage

barkprints <image> [options]

Options:
  -c, --corpus NAME    Corpus to use (default: nature)
  --top-k K           Return top K matches (default: 1)
  --list-corpora      Show available corpora

Built-in Corpora

Nature

  • Theme: Nature and forest wisdom
  • Size: 50 sentences
  • Content: Reflections on trees, seasons, growth, and natural cycles

Literature

  • Theme: Philosophical and literary quotes
  • Size: 30 sentences
  • Content: Classical wisdom and philosophical observations

Creating Custom Corpora

Create a corpus from any text file:

python -m barkprints.corpus_builder input.txt output.npz --name mycorpus --theme "Your theme"

Example: Create a News Corpus

# 1. Create a text file with sentences (one per line or paragraph)
cat > news.txt << EOF
Technology continues to reshape modern society.
Climate change demands urgent global action.
Scientific breakthroughs offer hope for the future.
EOF

# 2. Build the corpus with embeddings
uv run python -m barkprints.corpus_builder news.txt src/barkprints/corpora/news.npz --theme "Current events"

# 3. Use it
barkprints barks.jpg -c news

Corpus Guidelines

  • Text quality: Use well-formed, complete sentences
  • Sentence length: 10-200 characters work best
  • Diversity: Include varied language for richer matching
  • Theme coherence: Keep sentences thematically related
  • Size: 30-100 sentences is a good range

Example Outputs

With barks.jpg:

$ barkprints barks.jpg -c nature
Death feeds new life in endless succession.

$ barkprints barks.jpg -c literature
The journey matters more than the destination itself.

$ barkprints barks.jpg -c nature --top-k 3
Top 3 matches:
1. [0.067] Death feeds new life in endless succession.
2. [0.063] Decay transforms into fertile soil again.
3. [0.062] Connection requires opening the heart completely.

Programmatic Usage

from barkprints.text_generator import TextGenerator

generator = TextGenerator()

# Get single match
text = generator.generate("barks.jpg", "nature")
print(text)

# Get top 3 matches with scores
matches = generator.generate("barks.jpg", "nature", top_k=3)
for sentence, score in matches:
    print(f"[{score:.3f}] {sentence}")

# The same image always produces the same output
text2 = generator.generate("barks.jpg", "nature")
assert text == text2  # Always True!

Technical Details

Feature Vector: Extracts ~700 numerical features from images, then pads/truncates to match corpus embedding dimensions (typically 384 for all-MiniLM-L6-v2).

Embedding Model: Uses all-MiniLM-L6-v2 by default (384 dimensions). Can use any sentence-transformer model by specifying --model when building corpora.

Corpus Format: .npz files containing:

  • sentences: array of text strings
  • embeddings: (N, D) matrix of embeddings
  • metadata: dict with corpus info

Similarity Metric: Cosine similarity between normalized vectors. Higher scores indicate better matches.

Determinism: Same pixels → same features → same nearest neighbor → same text output

Development

Running Tests

uv run pytest

Project Structure

barkprints/
├── src/barkprints/
│   ├── feature_extractor.py   # Image → 384-D feature vector
│   ├── corpus.py               # Corpus data structure
│   ├── corpus_loader.py        # Load .npz corpus files
│   ├── corpus_builder.py       # Build corpora from text
│   ├── embedding_matcher.py    # Cosine similarity matching
│   ├── text_generator.py       # Main generation pipeline
│   └── corpora/               # Built-in corpus files
│       ├── nature.npz
│       └── literature.npz
├── tests/                      # Test suite
├── pyproject.toml             # Python project config
└── README.md                   # This file

Philosophy

This project treats images and text as inhabitants of the same conceptual space, a space of meaning represented numerically. By pretending that image features are text embeddings, we create a poetic bridge between visual texture and language. Each tree's unique bark pattern, their barkprint, becomes a coordinate in this shared space that points to a specific human expression.

For me, the project encourages me to look at trees and wonder what they may say, which opens a playful additional interaction layer with nature around me. It also addresses the interesting assumption that reality, human language, images of nature, could (can?) be compressed to a numerial representation. If everything is represented as numbers and we strip it down to that layer, how universal are these numbers? In this example of mixing image vectors and text vectors, I'm skipping a necessary translation step (one that works amazingly well these days, with CLIP etc.) by treating an image vector as a text vector. This idea makes me think about that there are different dialects of numeric languages that need translation between each other. There is a visual numeric dialect and a text-based dialect. Both are expressed in numbers, but the numbers mean different things. So in that sense, I think of it similarly to different languages. Anyways, just for fun, and climbing.

License

MIT

Credits

Created as an artistic exploration of the relationship between visual patterns and language through numerical representation.

Dependencies:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

barkprints-0.1.0.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

barkprints-0.1.0-py3-none-any.whl (132.2 kB view details)

Uploaded Python 3

File details

Details for the file barkprints-0.1.0.tar.gz.

File metadata

  • Download URL: barkprints-0.1.0.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.3

File hashes

Hashes for barkprints-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b74fd5ae4484994896030015b140f8e30c110db9c84ba8321cad867e3629edfb
MD5 bfee156038d432054561b1266eb1e640
BLAKE2b-256 9e5290786731fc78b054a454cbbd13a687dffacefdce5e078167649493639624

See more details on using hashes here.

File details

Details for the file barkprints-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: barkprints-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 132.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.3

File hashes

Hashes for barkprints-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9c474cb3b4392e45157279252d02b9bf71ea196fbbd2f559d02a5accb60f4dd3
MD5 1e72244fb3104052c94b557616284a07
BLAKE2b-256 6116fb49db748752383a1e307cb3084d8a4e89e32ad7d09d59e43d58be2e76d8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page