Skip to main content

Feature discovery and generation utilities

Project description

LLM_feature_gen

LLM Feature Gen is a Python library for discovering and generating interpretable features from unstructured data using Large Language Models (LLMs).
The library provides high-level utilities for:

  • Discovering human-interpretable features from sets of images,
  • Integrating prompts and model outputs into structured JSON representations,
    • Generating new feature representations automatically from raw multimodal data, e.g., creating structured tables for downstream models,

Module: discover

The discover module focuses on feature discovery — identifying interpretable, discriminative visual or textual properties using an LLM.

✅ What it does

Given a folder of images and a prompt, the library:

  1. Converts each image into Base64 format,
  2. Sends them to an LLM,
  3. Receives a structured JSON response describing the discovered features,
  4. Automatically saves the output to a JSON file in outputs/.

📂 Project Structure

LLM_feature_gen/
├─ src/
│  └─ LLM_feature_gen/
│     ├─ init.py
│     ├─ discover.py                # High-level orchestration for feature discovery
│     ├─ providers/
          ├─ openai_provider.py     # OpenAI API wrapper
│         ├─ local_provider.py      # Local LLM wrapper
│     ├─ prompts/
│     │   ├─ discovery_prompt.txt   # Default reasoning prompt
          ├─ generation_prompt.txt  # Default feature generation prompt
│     ├─ utils/
│     │   └─ image.py               # Image → base64 conversion
│     └─ tests/
│        └─ test_discover.py
├─ outputs/                         # Automatically generated feature JSONs
├─ pyproject.toml
└─ README.md

⚙️ Installation

Clone or download the repository, then install in editable mode:

pip install -e .

🔑 Environment Setup for OpenAI API

Create a .env file in the project root

Example: Discover Features from Images

from LLM_feature_gen.discover import discover_features_from_images
# Folder with your example images
image_folder = "discover_images"

# Run feature discovery
result = discover_features_from_images(
    image_paths_or_folder=image_folder,
    as_set=True,  # analyze all images jointly
)

print(result)

This will:

  • Read all .jpg/.png images from discover_images/
  • the default prompt (prompts/image_discovery_prompt.txt)
  • Send them to your LLM provider
  • Save the results to outputs/discovered_features_.json

Example saved JSON:

{
  "proposed_features": [
    {
      "feature": "has visible handle",
      "description": "Some objects include handles, others do not.",
      "possible_values": ["present", "absent"]
    },
    {
      "feature": "color tone",
      "description": "Images vary between metallic and earthy color palettes.",
      "possible_values": ["metallic", "matte", "bright", "dark"]
    }
  ]
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_feature_gen-0.1.0.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_feature_gen-0.1.0-py3-none-any.whl (16.7 kB view details)

Uploaded Python 3

File details

Details for the file llm_feature_gen-0.1.0.tar.gz.

File metadata

  • Download URL: llm_feature_gen-0.1.0.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for llm_feature_gen-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c992ed6129716d5719a2b08c86b0e07633f52afcf6bbd2a55258c9d28dba0f48
MD5 f8df7f8db9cee1984a9b62365a9defb7
BLAKE2b-256 bd974728bbe280c9356158a510b67730d5d6c9653082e5535c140490aa08fd09

See more details on using hashes here.

File details

Details for the file llm_feature_gen-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_feature_gen-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b50b897519d83e4117c93e995f2dd2e564a4ab294c02e0092ad9f48b49261f7b
MD5 296ab2da6ad1f6699b1910c0a6debeeb
BLAKE2b-256 a360143eeb80a56310b7bb29e45a8638660bc777fa9c0c51f999bd8135f1f3a1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page