sciphi: A framework for synthetic data.

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

SciPhi [ΨΦ]: A Framework for LLM Powered Data

SciPhi Logo

SciPhi is a Python-based framework designed to facilitate the generation of high-quality synthetic data tailored for both Large Language Models (LLMs) and human users. This suite offers:

Configurable Data Generation: Craft datasets mediated by LLMs according to your specifications.
Retriever-Augmented Generation (RAG) Integration: Make use of an integrated RAG Provider API. Also, it comes bundled with an evaluation harness to ground your generated data to real-world datasets.
Textbook Generation Module: A module to power the generation of RAG-augmented synthetic textbooks straight from a given table of contents.

Fast Setup

Install SciPhi via pip:

Base Installation:

pip install sciphi

Optional Dependencies:

Install with specific optional support using extras:

Anthropic: 'sciphi[anthropic_support]'
HF (includes Torch): 'sciphi[hf_support]'
Llama-CPP: 'sciphi[llama_cpp_support]'
Llama-Index: 'sciphi[llama_index_support]'
VLLM (includes Torch): 'sciphi[vllm_support]'

Recommended (All Optional Dependencies):

pip install 'sciphi[all_with_extras]'

Note: Depending on your shell, you might need to use quotes around the package name and extras to avoid globbing.

Features

Community & Support

Engage with our vibrant community on Discord.
For tailored inquiries or feedback, please email us.

Textbook Generation (The Library of Phi)

This is an effort to democratize access to top-tier textbooks. By leveraging cutting-edge AI techniques, we aim to produce factual and high-quality educational materials. This can readily be extended to other domains, such as internal commercial documents.

Generating Textbooks

Dry Run:

python -m sciphi.scripts.generate_textbook dry_run

Default Textbook Generation:
```
python -m sciphi.scripts.generate_textbook run --textbook=Aerodynamics_of_Viscous_Fluids --rag-enabled=False
```
You may use the setting rag-enabled to toggle on/off RAG augmentation of the textbook. You may customize the RAG provider through additional arguments.

See a sample output here.
Example With a Custom Table of Contents:

Prepare your table of contents and save it into $PWD/toc/test.yaml, for example. Then, run the following command:
```
python -m sciphi.scripts.generate_textbook run --toc_dir=toc --output_dir=books --data_dir=$PWD
```
Activating RAG Functionality:

Simply switch rag-enabled to True. Ensure you have the right .env variables set up, or provide CLI values for rag_api_base and rag_api_key.

Important: To make the most out of grounding your data with Wikipedia, ensure your system matches our detailed specifications. We offer additional examples and resources here.

RAG Eval Harness

To measure the efficacy of your RAG pipeline, we provide a unique RAG evaluation harness.

Deploying the RAG Harness

Initiate the Harness:

python -m sciphi.scripts.rag_harness --n-samples=100 --rag-enabled=True --evals_to_run="science_multiple_choice"

Local Development

Clone the Repository:

Begin by cloning the repository and stepping into the project directory:
```
git clone https://github.com/emrgnt-cmplxty/sciphi.git
cd sciphi
```
Install the Dependencies:

Start by installing the primary requirements:
```
pip install -r requirements.txt
```
If you require further functionalities, consider the following:
- For the developer's toolkit and utilities:
```
pip install -r requirements-dev.txt
```
- To encompass all optional dependencies:
```
pip install -r requirements_all.txt
```
Alternatively, to manage packages using Poetry:
```
poetry install
```
And for optional dependencies w/ Poetry:
```
poetry install -E [all, all_with_extras]
```
Setting Up Your Environment:

Begin by duplicating the sample environment file to craft your own:
```
cp .env.example .env
```
Next, use a text editor to adjust the .env file with your specific configurations. An example with vim is shown below:
```
vim .env
```
After entering your settings, ensure you save and exit the file.

System Requirements

Essential Packages:

Python Version: >=3.9,<3.12
Required Libraries:
- bs4: ^0.0.1
- fire: ^0.5.0
- openai: 0.27.8
- pandas: ^2.1.0
- python-dotenv: ^1.0.0
- pyyaml: ^6.0.1
- retrying: ^1.3.4
- sentencepiece: ^0.1.99
- torch: ^2.1.0
- tiktoken: ^0.5.1
- tqdm: ^4.66.1

Supplementary Packages:

Anthropic Integration:
- anthropic: ^0.3.10
Hugging Face Tools:
- accelerate: ^0.23.0
- datasets: ^2.14.5
- transformers: ^4.33.1
Llama-Index:
- llama-index: ^0.8.29.post1
Llama-CPP:
- llama-cpp-python: ^0.2.11
VLLM Tools:
- vllm: 0.2.0

Licensing and Acknowledgment

This project is licensed under the Apache-2.0 License.

Citing Our Work

If SciPhi plays a role in your research, we kindly ask you to acknowledge us with the following citation:

@software{SciPhi,
author = {Colegrove, Owen},
doi = {Pending},
month = {09},
title = {{SciPhi: A Framework for LLM Powered Data}},
url = {https://github.com/sciphi-ai/sciphi},
year = {2023}
}

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.1.7

Oct 31, 2023

0.1.6

Oct 31, 2023

0.1.4

Oct 24, 2023

0.1.3

Oct 23, 2023

0.1.2

Oct 23, 2023

This version

0.1.1

Oct 23, 2023

0.1.0

Oct 23, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sciphi-0.1.1.tar.gz (444.7 kB view hashes)

Uploaded Oct 23, 2023 Source

Built Distribution

sciphi-0.1.1-py3-none-any.whl (475.8 kB view hashes)

Uploaded Oct 23, 2023 Python 3

Hashes for sciphi-0.1.1.tar.gz

Hashes for sciphi-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`bf7e87d6d27c3d39cc33635bc622b52f631c9619446a392e3a26ac56e7ec07a7`
MD5	`9e2f689620a4b160bfc7d3f764d24e49`
BLAKE2b-256	`90f1564f88c3f4ae425a694660e79692442cbd08331cd6df7ef3ec9a913f2c67`

Hashes for sciphi-0.1.1-py3-none-any.whl

Hashes for sciphi-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`847e334bcee0bb11c2a1982c680d67adb5eefaea6d1ef08d01c5678a7151ed99`
MD5	`4966ee156ef0e3852fdc7e0fa5becdb7`
BLAKE2b-256	`acf6d1e30e7507fad6de82a4f1c8005b1fe6311bec5922813cbfbf1b2c3c04b7`