Skip to main content

Easiest and fastest way to 1B synthetic tokens

Project description

fastdata

Developer Guide

If you are new to using nbdev here are some useful pointers to get you started.

Install fastdata in Development mode

# make sure fastdata package is installed in development mode
$ pip install -e .

# make changes under nbs/ directory
# ...

# compile to have changes apply to fastdata
$ nbdev_prepare

Usage

Installation

Install latest from the GitHub repository:

$ pip install git+https://github.com/AnswerDotAI/fastdata.git

or from conda

$ conda install -c AnswerDotAI fastdata

or from pypi

$ pip install fastdata

Documentation

Documentation can be found hosted on this GitHub repository’s pages. Additionally you can find package manager specific guidelines on conda and pypi respectively.

How to use

First you need to define the structure of the data you want to generate. instructor, which is the library that fastdata uses to generate data, requires you to define the schema of the data you want to generate. This is done using pydantic models.

from pydantic import BaseModel, Field

class Translation(BaseModel):
    english: str = Field(description="An english phrase")
    german: str = Field(description="An equivalent german phrase that is a translation of the english phrase")

Next, you need to define the prompt that will be used to generate the data and any inputs you want to pass to the prompt.

prompt_template = """\
Generate English and German translations on the following topic:
{topic}
"""

inputs = [{"topic": "Otters are cute"}, {"topic": "I love programming"}]

Finally, we can generate some data with fastdata.

[!NOTE]

We only support Anthropic models at the moment. Therefore, make sure you have an API key for the model you want to use and the proper environment variables set or pass the api key to the FastData class FastData(api_key="sk-ant-api03-...").

from fastdata.core import FastData

import pprint

# Create a pretty printer object with custom settings
pp = pprint.PrettyPrinter(indent=4, width=100, compact=False)

fast_data = FastData()
translations = fast_data.generate(
    prompt_template=prompt_template,
    inputs=inputs,
    response_model=Translation,
    model="claude-3-haiku-20240307"
)

# Pretty print the translations
print("Translations:")
pp.pprint(translations)
100%|██████████| 2/2 [00:00<00:00,  2.21it/s]

Translations:
[   {'english': 'Otters are cute', 'german': 'Otter sind süß'},
    {'english': 'I love programming', 'german': 'Ich liebe das Programmieren'}]

If you’d like to see how best to generate data with fastdata, check out our blog post here and some of the examples in the examples directory.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python-fastdata-0.0.1.tar.gz (9.5 kB view details)

Uploaded Source

Built Distribution

python_fastdata-0.0.1-py3-none-any.whl (8.8 kB view details)

Uploaded Python 3

File details

Details for the file python-fastdata-0.0.1.tar.gz.

File metadata

  • Download URL: python-fastdata-0.0.1.tar.gz
  • Upload date:
  • Size: 9.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for python-fastdata-0.0.1.tar.gz
Algorithm Hash digest
SHA256 ec900582f7ad8307a75a0d968660b128f147582d9e2fc4479d3c47f4b7c95639
MD5 b763775251883cb72ed7929b79f03724
BLAKE2b-256 d8adfdf2f77e78602e15d5713eacbe6f72e77833c2e8197f04853dd44eadd32f

See more details on using hashes here.

File details

Details for the file python_fastdata-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for python_fastdata-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1ef0ea585e8e752e46cc5483f2608d81d6130b52e9577d7a6b85384bb45ed589
MD5 724ba90ea9106a6e592799db5db6f73e
BLAKE2b-256 c8dcd0ec45d815713763b8f4bec47094c400c9a8def3446e6134268dc96d7b2b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page