llmfsd: LLM Fake Structured Data, faking Structured Data from any LLM

These details have not been verified by PyPI

Project description

llmfsd: LLM Fake Structured Data

llmfsd is a Python package designed to generate fake structured data using any Large Language Model (LLM). With this package, you can execute SQL-like queries to simulate structured data in formats like JSON or CSV. The tool is highly customizable and supports the integration of multiple AI providers (thanks aisuite).

Features

Generate fake structured data via SQL queries.
Supports JSON and CSV output formats.
Define custom data models to control schema and descriptions.
Integrates with various AI providers (e.g., OpenAI, Mistral, Google, Anthropic).

Installation

Install llmfsd using pip:

pip install llmfsd

Install a Provider’s Package Along with aisuite

llmfsd supports all AI providers supported by aisuite. If you have not already installed the provider’s package, you can do so along with llmfsd. For example:

pip install "llmfsd[mistral]"

Alternatively, you can install the provider’s package directly with aisuite:

pip install "aisuite[mistral]"

For more details, visit the aisuite repository.

Usage

Basic Example

Here’s a simple example to get started:

from llmfsd import Faker

# Initialize Faker with your LLM model_id (aisuite id format)
faker = Faker(model_id="mistral:mistral-large-latest")

# Generate JSON data
print(faker.json("SELECT uuid, name FROM phone_brands LIMIT 4"))

"""
Output:
[
 {'uuid': 'f47ac10b-58cc-4372-a567-0e02b2c3d479', 'name': 'Nokia'},
 {'uuid': 'f7bac13b-58cc-4372-a567-0e02b2c3d479', 'name': 'Samsung'}, 
 {'uuid': 'f98ac12b-58cc-4372-a567-0e02b2c3d479', 'name': 'Apple'},
 {'uuid': 'f47ac10b-58cc-4972-a567-0e02b2c3d479', 'name': 'Sony'}
]
"""

# Generate CSV data
print(faker.csv("SELECT id, color FROM colors LIMIT 2"))

"""
Output:
id,color
1,red
2,blue
"""

More Advanced Example with Data Models

You can define custom data models to control the structure of your fake data.

from llmfsd import Faker, DataModel

# Define data models

model = DataModel("dogs", 
    {"id": "Number in range(5,20)", "name": None, "breed": "Breed of the dog"}
)

# Initialize Faker with data models
faker = Faker(model_id="mistral:mistral-large-latest", data_models=[model])

# Generate JSON data for a specific model
print(faker.json("SELECT * FROM dogs LIMIT 3"))

"""
Output:
[
  {
    "id": 7,
    "name": "Buddy",
    "breed": "Labrador"
  },
  {
    "id": 12,
    "name": "Charlie",
    "breed": "Golden Retriever"
  },
  {
    "id": 15,
    "name": "Max",
    "breed": "German Shepherd"
  }
]
"""

AI Providers

To initialize with different providers, set the model_id parameter during Faker initialization using aisuite format.

Examples

faker1 = Faker(model_id="groq:llama-3.2-3b-preview")

faker2 = Faker(model_id="openai:gpt-3.5-turbo")

faker3 = Faker(model_id="huggingface:mistralai/Mistral-7B-Instruct-v0.3")

Each provider requires proper API_KEY. Use environment variables or configuration files to store your API keys securely. For example you need mistral you need MISTRAL_API_KEY

export MISTRAL_API_KEY="your-mistral-api-key"
export OPENAI_API_KEY="your-openai-api-key"
export ANTHROPIC_API_KEY="your-anthropic-api-key"

Methods

json(query: str, output: Optional[str] = None) -> list[dict] | None

Generate fake structured data in JSON format.

query: The SQL query to execute.
output: File path to save the JSON output. If None, returns the data directly.

csv(query: str, output: Optional[str] = None) -> str | None

Generate fake structured data in CSV format.

query: The SQL query to execute.
output: File path to save the CSV output. If None, returns the data directly.

Custom Data Models

You can create custom schemas using DataModel, defining either a list of attributes or a dictionary with descriptions.

DataModel allows you to use * as a wildcard in queries or provide minimal descriptions for your attributes to the LLM.

Avoid providing unnecessary descriptions, as they can increase token consumption. It is recommended to use a list of attributes if the attributes are self-explanatory for the LLM. When using a dictionary-based schema, you can leave None for some attributes and provide descriptions only for those you wish to clarify.

Example:

from llmfsd import DataModel

Schema as a list

model1 = DataModel("cars", ["brand", "model", "year"])

Schema as a dictionary

model2 = DataModel("pets", {
    "id" : "uuid string",
    "name": None,
    "age":  None,
    "species": "Type of pet (e.g., dog, cat)"
})

Pass these models to Faker during initialization:

faker = Faker(model_id="openai:gpt-4o", data_models=[model1, model2])

Saving Output to a File

Both json and csv methods support saving results directly to a file.

Save JSON data to a file

faker.json("SELECT * FROM artists LIMIT 20", output="artists.json")

Save CSV data to a file

faker.csv("SELECT name, age FROM pets LIMIT 20", output="pets.csv")

Github

https://github.com/dinyad-prog00/llmfsd

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.3

Nov 27, 2024

0.1.2

Nov 27, 2024

This version

0.1.1

Nov 26, 2024

0.1.0

Nov 26, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmfsd-0.1.1.tar.gz (6.0 kB view details)

Uploaded Nov 26, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmfsd-0.1.1-py3-none-any.whl (7.1 kB view details)

Uploaded Nov 26, 2024 Python 3

File details

Details for the file llmfsd-0.1.1.tar.gz.

File metadata

Download URL: llmfsd-0.1.1.tar.gz
Upload date: Nov 26, 2024
Size: 6.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.4 CPython/3.10.15 Darwin/24.0.0

File hashes

Hashes for llmfsd-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`082a0f89bc9237c036aa723a2e6b442f65525f31c2619fa78fda30ceea744dd4`
MD5	`20f5e18bc320c8952feb1ae7cae6c999`
BLAKE2b-256	`8cbcee6871f308664509cc0c89ef56c3389cf15043faed7829039584a9c1e70d`

See more details on using hashes here.

File details

Details for the file llmfsd-0.1.1-py3-none-any.whl.

File metadata

Download URL: llmfsd-0.1.1-py3-none-any.whl
Upload date: Nov 26, 2024
Size: 7.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.4 CPython/3.10.15 Darwin/24.0.0

File hashes

Hashes for llmfsd-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4adf1a2e814d73e3dd4e2d44845d5880616dd783a6d86cef4078023cf722b27e`
MD5	`db841c0dc0c8a1b588056c18516f4954`
BLAKE2b-256	`9f1d55d51eb53d86177bad16ddba8128c66f5b7206634191286c597c7d6f61b8`

See more details on using hashes here.

llmfsd 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

llmfsd: LLM Fake Structured Data

Features

Installation

Install a Provider’s Package Along with aisuite

Usage

Basic Example

More Advanced Example with Data Models

AI Providers

Examples

Methods

json(query: str, output: Optional[str] = None) -> list[dict] | None

csv(query: str, output: Optional[str] = None) -> str | None

Custom Data Models

Example:

Schema as a list

Schema as a dictionary

Saving Output to a File

Save JSON data to a file

Save CSV data to a file

Github

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes