Easiest and fastest way to 1B synthetic tokens
Project description
fastdata
Developer Guide
If you are new to using nbdev
here are some useful pointers to get you
started.
Install fastdata in Development mode
# make sure fastdata package is installed in development mode
$ pip install -e .
# make changes under nbs/ directory
# ...
# compile to have changes apply to fastdata
$ nbdev_prepare
Usage
Installation
Install latest from the GitHub repository:
$ pip install git+https://github.com/AnswerDotAI/fastdata.git
or from conda
$ conda install -c AnswerDotAI fastdata
or from pypi
$ pip install fastdata
Documentation
Documentation can be found hosted on this GitHub repository’s pages. Additionally you can find package manager specific guidelines on conda and pypi respectively.
How to use
First you need to define the structure of the data you want to generate.
instructor
, which is the library that fastdata uses to generate data,
requires you to define the schema of the data you want to generate. This
is done using pydantic models.
from pydantic import BaseModel, Field
class Translation(BaseModel):
english: str = Field(description="An english phrase")
german: str = Field(description="An equivalent german phrase that is a translation of the english phrase")
Next, you need to define the prompt that will be used to generate the data and any inputs you want to pass to the prompt.
prompt_template = """\
Generate English and German translations on the following topic:
{topic}
"""
inputs = [{"topic": "Otters are cute"}, {"topic": "I love programming"}]
Finally, we can generate some data with fastdata.
[!NOTE]
We only support Anthropic models at the moment. Therefore, make sure you have an API key for the model you want to use and the proper environment variables set or pass the api key to the
FastData
classFastData(api_key="sk-ant-api03-...")
.
from fastdata.core import FastData
import pprint
# Create a pretty printer object with custom settings
pp = pprint.PrettyPrinter(indent=4, width=100, compact=False)
fast_data = FastData()
translations = fast_data.generate(
prompt_template=prompt_template,
inputs=inputs,
response_model=Translation,
model="claude-3-haiku-20240307"
)
# Pretty print the translations
print("Translations:")
pp.pprint(translations)
100%|██████████| 2/2 [00:00<00:00, 2.21it/s]
Translations:
[ {'english': 'Otters are cute', 'german': 'Otter sind süß'},
{'english': 'I love programming', 'german': 'Ich liebe das Programmieren'}]
If you’d like to see how best to generate data with fastdata, check out our blog post here and some of the examples in the examples directory.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file python-fastdata-0.0.1.tar.gz
.
File metadata
- Download URL: python-fastdata-0.0.1.tar.gz
- Upload date:
- Size: 9.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec900582f7ad8307a75a0d968660b128f147582d9e2fc4479d3c47f4b7c95639 |
|
MD5 | b763775251883cb72ed7929b79f03724 |
|
BLAKE2b-256 | d8adfdf2f77e78602e15d5713eacbe6f72e77833c2e8197f04853dd44eadd32f |
File details
Details for the file python_fastdata-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: python_fastdata-0.0.1-py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1ef0ea585e8e752e46cc5483f2608d81d6130b52e9577d7a6b85384bb45ed589 |
|
MD5 | 724ba90ea9106a6e592799db5db6f73e |
|
BLAKE2b-256 | c8dcd0ec45d815713763b8f4bec47094c400c9a8def3446e6134268dc96d7b2b |