Merge. Synthesize. Create. Dialektik generates new content by fusing ideas from diverse sources, revealing unexpected insights and perspectives.
Project description
Dialektik
Merge. Synthesize. Create. Dialektik generates new content by fusing ideas from diverse sources, revealing unexpected insights and perspectives.
Features
- Loads and processes datasets from multiple sources
- Summarizes text into concise bullet points
- Synthesizes bullet points into detailed articles
- Supports various AI models for text generation
- Model-agnostic design allows easy swapping of different LLMs
Requirements
- Required:
datasets
,huggingface_hub
- Optional:
phi-3-vision-mlx
(required only if you need to create a new dataset with the providedsetup()
function for custom dataset processing)
Installation
To install Dialektik with core dependencies only:
pip install dialektik
To install Dialektik with all dependencies, including those required for the setup() function:
pip install dialektik[setup]
Note: Install the full version if you plan to process custom datasets using the setup()
function.
Setup
-
Clone the repository:
git clone https://github.com/JosefAlbersç/Dialektik.git cd Dialektik
-
Install the required dependencies:
pip install -r requirements.txt
Usage
Command Line Interface
Dialektik can be used from the command line after installation. Here are some example usages:
-
Generate a synthesis with default settings:
dialektik
-
Specify sources:
dialektik --source arxiv
-
Set the number of bullet points per book and choose a different model:
dialektik --per-book 5 --model "your-preferred-model"
-
Run the setup function:
dialektik --setup
-
For a full list of options, use:
dialektik --help
Accessing the Dataset
Important Note: The default dataset at 'JosefAlbers/StampyAI-alignment-research-dataset' is currently being prepared (ETA: 18 hours). Please check back later if unavailable.
The default dataset is to be publicly available. You do not need to set up any environment variables or run the setup() function to use dialektik
with this dataset.
Synthesizing content
To generate a synthesis, simply run:
from dialektik import synthesize
output = synthesize()
You can customize the synthesis process by passing optional parameters:
output = synthesize(
list_source=['your_source'],
per_book=3,
api_model="mistralai/Mistral-Nemo-Instruct-2407"
)
(Optional) Using Custom Datasets
If you want to use your own dataset:
- Prepare your dataset according to the required format.
- Modify the
PATH_DS
variable in the code to point to your dataset. - If your dataset is private or requires authentication, set up the following environment variables:
HF_WRITE_TOKEN
: Hugging Face write token (for pushing datasets)HF_READ_TOKEN
: Hugging Face read token (for accessing private datasets)
Note: The setup()
function provided in the code is a demonstration of how you might process a custom dataset. Different datasets may require different processing steps, so you'll need to adapt this function to your specific needs.
Customizing the LLM
Dialektik is designed to be model-agnostic. To use a different language model:
- Simply pass the name of your chosen model to the
synthesize()
function using theapi_model
parameter. - Modify the
mistral_api()
function or create a new function that interfaces with your chosen LLM. - Update the
synthesize()
function to use your new LLM interface.
The default model is "mistralai/Mistral-Nemo-Instruct-2407", but you can easily change this by passing a different api_model
parameter to the synthesize()
function.
License
This project is licensed under the MIT License.
Citation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dialektik-0.0.1a0.tar.gz
.
File metadata
- Download URL: dialektik-0.0.1a0.tar.gz
- Upload date:
- Size: 5.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 02bddf98e5b21d1c91f474efbc16c8afd4ed7da8630e33cac98a3fba30c08da9 |
|
MD5 | 78efca994c73226c2377cde20edd71a5 |
|
BLAKE2b-256 | c86ba84f0426f040e16c608b8467fb4a42436921aef0700a7ac0818bb1cf7374 |
File details
Details for the file dialektik-0.0.1a0-py3-none-any.whl
.
File metadata
- Download URL: dialektik-0.0.1a0-py3-none-any.whl
- Upload date:
- Size: 5.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2a62a6c8ba4a01bcecadce3bf1325d1a183713602f6780c3c21fcb3c1897c603 |
|
MD5 | e911aefe649df2ed2b8c355c1bef3669 |
|
BLAKE2b-256 | 7c12bdd302d0b445467672fcaf0db196ee2cc1f1b5df714bc15017dcf27d8fd1 |