Skip to main content

Pseudo-random seed data generation for ML/LLM training diversity

Project description

liquidrandom

Pseudo-random seed data for ML/LLM training diversity.

When using LLMs to generate training data, outputs tend to be repetitive and lack variety. liquidrandom solves this by providing a large pool of diverse, pre-generated seed data (personas, jobs, scenarios, etc.) that you can inject into your prompts to steer generation toward more varied outputs.

Installation

pip install liquidrandom
# or
uv add liquidrandom

Quick Start

import liquidrandom

# Get a random persona to inject into your LLM prompt
persona = liquidrandom.persona()
print(persona)
# Alice is a 30-year-old female from Canada. They work as an engineer. ...

# Get a random coding task
task = liquidrandom.coding_task()
print(task)
# [Python, medium] Implement a trie: Build a trie data structure ...

Available Categories

Function Returns Description
liquidrandom.persona() Persona Random personas with name, age, gender, occupation, nationality, personality traits, background
liquidrandom.job() Job Professions with title, industry, description, required skills, experience level
liquidrandom.coding_task() CodingTask Programming challenges with title, language, difficulty, description, constraints, expected behavior
liquidrandom.math_category() MathCategory Math categories with name, field, description, example problems
liquidrandom.writing_style() WritingStyle Writing styles with name, tone, characteristics, description
liquidrandom.scenario() Scenario Real-world scenarios with title, context, setting, stakes, description
liquidrandom.domain() Domain Knowledge domains with name, parent field, description, key concepts
liquidrandom.science_topic() ScienceTopic Scientific topics with name, field, subfield, description
liquidrandom.language() Language Languages/locales with name, region, register, script, cultural notes
liquidrandom.reasoning_pattern() ReasoningPattern Reasoning approaches with name, category, description, when to use
liquidrandom.emotional_state() EmotionalState Emotional states with name, intensity, valence, behavioral description
liquidrandom.instruction_complexity() InstructionComplexity Instruction complexity levels with level, ambiguity, description, example

Usage Example

Use liquidrandom to add diversity to your LLM data generation pipeline:

import liquidrandom
from openai import OpenAI

client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key="<OPENROUTER_API_KEY>",
)

persona = liquidrandom.persona()
style = liquidrandom.writing_style()
topic = liquidrandom.science_topic()

prompt = f"""You are {persona}
Write in the following style: {style}
Explain the following topic: {topic}"""

response = client.chat.completions.create(
    model="liquid/lfm-2-24b-a2b",
    messages=[{"role": "user", "content": prompt}],
)

Each call to a liquidrandom function returns a typed dataclass. You can use them directly in f-strings (via __str__) or access their individual fields:

persona = liquidrandom.persona()
print(persona.name)               # "Alice"
print(persona.age)                 # 30
print(persona.personality_traits)  # ["curious", "patient"]

How It Works

The dataset contains 340,000+ samples across 12 categories, generated using hierarchical taxonomy trees with LLM-based quality validation and fuzzy deduplication.

Seed data is hosted on HuggingFace (mlech26l/liquidrandom-data) as zstd-compressed Parquet files. On first use, only the requested category file is downloaded and cached locally. Subsequent calls use the cached data.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

liquidrandom-0.1.0.tar.gz (86.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

liquidrandom-0.1.0-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file liquidrandom-0.1.0.tar.gz.

File metadata

  • Download URL: liquidrandom-0.1.0.tar.gz
  • Upload date:
  • Size: 86.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for liquidrandom-0.1.0.tar.gz
Algorithm Hash digest
SHA256 dee05c1aad613b4575fc7b2ed27e7feb471128ab23a563bcda2eecdb90763d62
MD5 c8f3b6d9dfd8f92102ffc754090b61fc
BLAKE2b-256 1c2fd9ede0a7f2a81592ea69d291be50586fccd0fecb291827b6622aa1b3cf4e

See more details on using hashes here.

File details

Details for the file liquidrandom-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: liquidrandom-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for liquidrandom-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e979fe7c6f5356012b01acfe693ee6a97b56a09f8c44342a40f05b44ae61493f
MD5 60844b3857c9ac2eed52d523b21830ec
BLAKE2b-256 a845b50af538683f10c2c56acff7f871f1729e3b33352c825e132ecab010ab1f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page