Pseudo-random seed data generation for ML/LLM training diversity
Project description
liquidrandom
Pseudo-random seed data for ML/LLM training diversity.
When using LLMs to generate training data, outputs tend to be repetitive and lack variety. liquidrandom solves this by providing a large pool of diverse, pre-generated seed data (personas, jobs, scenarios, etc.) that you can inject into your prompts to steer generation toward more varied outputs.
Installation
pip install liquidrandom
# or
uv add liquidrandom
Quick Start
import liquidrandom
# Get a random persona to inject into your LLM prompt
persona = liquidrandom.persona()
print(persona)
# Alice is a 30-year-old female from Canada. They work as an engineer. ...
# Get a random coding task
task = liquidrandom.coding_task()
print(task)
# [Python, medium] Implement a trie: Build a trie data structure ...
Available Categories
| Function | Returns | Description |
|---|---|---|
liquidrandom.persona() |
Persona |
Random personas with name, age, gender, occupation, nationality, personality traits, background |
liquidrandom.job() |
Job |
Professions with title, industry, description, required skills, experience level |
liquidrandom.coding_task() |
CodingTask |
Programming challenges with title, language, difficulty, description, constraints, expected behavior |
liquidrandom.math_category() |
MathCategory |
Math categories with name, field, description, example problems |
liquidrandom.writing_style() |
WritingStyle |
Writing styles with name, tone, characteristics, description |
liquidrandom.scenario() |
Scenario |
Real-world scenarios with title, context, setting, stakes, description |
liquidrandom.domain() |
Domain |
Knowledge domains with name, parent field, description, key concepts |
liquidrandom.science_topic() |
ScienceTopic |
Scientific topics with name, field, subfield, description |
liquidrandom.language() |
Language |
Languages/locales with name, region, register, script, cultural notes |
liquidrandom.reasoning_pattern() |
ReasoningPattern |
Reasoning approaches with name, category, description, when to use |
liquidrandom.emotional_state() |
EmotionalState |
Emotional states with name, intensity, valence, behavioral description |
liquidrandom.instruction_complexity() |
InstructionComplexity |
Instruction complexity levels with level, ambiguity, description, example |
Usage Example
Use liquidrandom to add diversity to your LLM data generation pipeline:
import liquidrandom
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="<OPENROUTER_API_KEY>",
)
persona = liquidrandom.persona()
style = liquidrandom.writing_style()
topic = liquidrandom.science_topic()
prompt = f"""You are {persona}
Write in the following style: {style}
Explain the following topic: {topic}"""
response = client.chat.completions.create(
model="liquid/lfm-2-24b-a2b",
messages=[{"role": "user", "content": prompt}],
)
Each call to a liquidrandom function returns a typed dataclass. You can use them directly in f-strings (via __str__) or access their individual fields:
persona = liquidrandom.persona()
print(persona.name) # "Alice"
print(persona.age) # 30
print(persona.personality_traits) # ["curious", "patient"]
How It Works
The dataset contains 340,000+ samples across 12 categories, generated using hierarchical taxonomy trees with LLM-based quality validation and fuzzy deduplication.
Seed data is hosted on HuggingFace (mlech26l/liquidrandom-data) as zstd-compressed Parquet files. On first use, only the requested category file is downloaded and cached locally. Subsequent calls use the cached data.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file liquidrandom-0.1.0.tar.gz.
File metadata
- Download URL: liquidrandom-0.1.0.tar.gz
- Upload date:
- Size: 86.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dee05c1aad613b4575fc7b2ed27e7feb471128ab23a563bcda2eecdb90763d62
|
|
| MD5 |
c8f3b6d9dfd8f92102ffc754090b61fc
|
|
| BLAKE2b-256 |
1c2fd9ede0a7f2a81592ea69d291be50586fccd0fecb291827b6622aa1b3cf4e
|
File details
Details for the file liquidrandom-0.1.0-py3-none-any.whl.
File metadata
- Download URL: liquidrandom-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e979fe7c6f5356012b01acfe693ee6a97b56a09f8c44342a40f05b44ae61493f
|
|
| MD5 |
60844b3857c9ac2eed52d523b21830ec
|
|
| BLAKE2b-256 |
a845b50af538683f10c2c56acff7f871f1729e3b33352c825e132ecab010ab1f
|