Generate poetry and text from tree bark images using deterministic feature mapping

Project description

Barkprints 🌳

Generate text from tree bark images using embedding-based corpus matching.

Concept

Barkprints explores finding hidden language in tree bark patterns. Instead of attempting to recognize what's in an image, it treats the numerical pixel data like it's already a text embedding and finds the most similar actual text from a corpus.

Every bark image produces a deterministic voice. Each unique bark pattern maps to a consistent position in embedding space, which matches to specific sentences in a chosen corpus. The same image always finds the same text, as if each tree speaks through its texture.

How It Works

Feature Extraction: Extracts 384-dimensional feature vector from bark image
- Color histograms, texture gradients, spatial statistics
- Frequency features via DCT coefficients
- Normalized to embedding-like range [-1, 1]
Corpus Loading: Loads pre-embedded text corpus
- Sentences are embedded using sentence-transformers
- Stored with their embeddings for fast matching
Similarity Matching: Finds nearest text via cosine similarity
- Treats image features as if they were text embeddings
- Computes similarity with all corpus embeddings
- Returns the closest match(es)
Deterministic Output: Same image + corpus = same text, always

Installation

Using uv (recommended):

cd barkprints
uv sync

Quick Start

Generate text from a bark image:

barkprints barks.jpg -c nature

Try a different corpus:

barkprints barks.jpg -c literature

Get top 3 matches with similarity scores:

barkprints barks.jpg -c nature --top-k 3

Process multiple images:

barkprints tree1.jpg tree2.jpg tree3.jpg -c nature

Usage

barkprints <image> [options]

Options:
  -c, --corpus NAME    Corpus to use (default: nature)
  --top-k K           Return top K matches (default: 1)
  --list-corpora      Show available corpora

Built-in Corpora

Nature

Theme: Nature and forest wisdom
Size: 50 sentences
Content: Reflections on trees, seasons, growth, and natural cycles

Literature

Theme: Philosophical and literary quotes
Size: 30 sentences
Content: Classical wisdom and philosophical observations

Creating Custom Corpora

Create a corpus from any text file:

python -m barkprints.corpus_builder input.txt output.npz --name mycorpus --theme "Your theme"

Example: Create a News Corpus

# 1. Create a text file with sentences (one per line or paragraph)
cat > news.txt << EOF
Technology continues to reshape modern society.
Climate change demands urgent global action.
Scientific breakthroughs offer hope for the future.
EOF

# 2. Build the corpus with embeddings
uv run python -m barkprints.corpus_builder news.txt src/barkprints/corpora/news.npz --theme "Current events"

# 3. Use it
barkprints barks.jpg -c news

Corpus Guidelines

Text quality: Use well-formed, complete sentences
Sentence length: 10-200 characters work best
Diversity: Include varied language for richer matching
Theme coherence: Keep sentences thematically related
Size: 30-100 sentences is a good range

Example Outputs

With barks.jpg:

$ barkprints barks.jpg -c nature
Death feeds new life in endless succession.

$ barkprints barks.jpg -c literature
The journey matters more than the destination itself.

$ barkprints barks.jpg -c nature --top-k 3
Top 3 matches:
1. [0.067] Death feeds new life in endless succession.
2. [0.063] Decay transforms into fertile soil again.
3. [0.062] Connection requires opening the heart completely.

Programmatic Usage

from barkprints.text_generator import TextGenerator

generator = TextGenerator()

# Get single match
text = generator.generate("barks.jpg", "nature")
print(text)

# Get top 3 matches with scores
matches = generator.generate("barks.jpg", "nature", top_k=3)
for sentence, score in matches:
    print(f"[{score:.3f}] {sentence}")

# The same image always produces the same output
text2 = generator.generate("barks.jpg", "nature")
assert text == text2  # Always True!

Technical Details

Feature Vector: Extracts ~700 numerical features from images, then pads/truncates to match corpus embedding dimensions (typically 384 for all-MiniLM-L6-v2).

Embedding Model: Uses all-MiniLM-L6-v2 by default (384 dimensions). Can use any sentence-transformer model by specifying --model when building corpora.

Corpus Format: .npz files containing:

sentences: array of text strings
embeddings: (N, D) matrix of embeddings
metadata: dict with corpus info

Similarity Metric: Cosine similarity between normalized vectors. Higher scores indicate better matches.

Determinism: Same pixels → same features → same nearest neighbor → same text output

Development

Running Tests

uv run pytest

Project Structure

barkprints/
├── src/barkprints/
│   ├── feature_extractor.py   # Image → 384-D feature vector
│   ├── corpus.py               # Corpus data structure
│   ├── corpus_loader.py        # Load .npz corpus files
│   ├── corpus_builder.py       # Build corpora from text
│   ├── embedding_matcher.py    # Cosine similarity matching
│   ├── text_generator.py       # Main generation pipeline
│   └── corpora/               # Built-in corpus files
│       ├── nature.npz
│       └── literature.npz
├── tests/                      # Test suite
├── pyproject.toml             # Python project config
└── README.md                   # This file

Philosophy

This project treats images and text as inhabitants of the same conceptual space, a space of meaning represented numerically. By pretending that image features are text embeddings, we create a poetic bridge between visual texture and language. Each tree's unique bark pattern, their barkprint, becomes a coordinate in this shared space that points to a specific human expression.

For me, the project encourages me to look at trees and wonder what they may say, which opens a playful additional interaction layer with nature around me. It also addresses the interesting assumption that reality, human language, images of nature, could (can?) be compressed to a numerial representation. If everything is represented as numbers and we strip it down to that layer, how universal are these numbers? In this example of mixing image vectors and text vectors, I'm skipping a necessary translation step (one that works amazingly well these days, with CLIP etc.) by treating an image vector as a text vector. This idea makes me think about that there are different dialects of numeric languages that need translation between each other. There is a visual numeric dialect and a text-based dialect. Both are expressed in numbers, but the numbers mean different things. So in that sense, I think of it similarly to different languages. Anyways, just for fun, and climbing.

License

MIT

Credits

Created as an artistic exploration of the relationship between visual patterns and language through numerical representation.

Dependencies:

sentence-transformers for text embeddings
PIL/Pillow for image processing
NumPy & SciPy for numerical operations

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Oct 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

barkprints-0.1.0.tar.gz (1.1 MB view details)

Uploaded Oct 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

barkprints-0.1.0-py3-none-any.whl (132.2 kB view details)

Uploaded Oct 24, 2025 Python 3

File details

Details for the file barkprints-0.1.0.tar.gz.

File metadata

Download URL: barkprints-0.1.0.tar.gz
Upload date: Oct 24, 2025
Size: 1.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.3

File hashes

Hashes for barkprints-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`b74fd5ae4484994896030015b140f8e30c110db9c84ba8321cad867e3629edfb`
MD5	`bfee156038d432054561b1266eb1e640`
BLAKE2b-256	`9e5290786731fc78b054a454cbbd13a687dffacefdce5e078167649493639624`

See more details on using hashes here.

File details

Details for the file barkprints-0.1.0-py3-none-any.whl.

File metadata

Download URL: barkprints-0.1.0-py3-none-any.whl
Upload date: Oct 24, 2025
Size: 132.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.3

File hashes

Hashes for barkprints-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9c474cb3b4392e45157279252d02b9bf71ea196fbbd2f559d02a5accb60f4dd3`
MD5	`1e72244fb3104052c94b557616284a07`
BLAKE2b-256	`6116fb49db748752383a1e307cb3084d8a4e89e32ad7d09d59e43d58be2e76d8`

See more details on using hashes here.

barkprints 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Barkprints 🌳

Concept

How It Works

Installation

Quick Start

Usage

Built-in Corpora

Nature

Literature

Creating Custom Corpora

Example: Create a News Corpus

Corpus Guidelines

Example Outputs

Programmatic Usage

Technical Details

Development

Running Tests

Project Structure

Philosophy

License

Credits

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes