Skip to main content

Merge. Synthesize. Create. Dialektik generates new content by fusing ideas from diverse sources, revealing unexpected insights and perspectives.

Project description

Dialektik

Merge. Synthesize. Create. Dialektik generates new content by fusing ideas from diverse sources, revealing unexpected insights and perspectives.

Features

  • Loads and processes datasets from multiple sources
  • Summarizes text into concise bullet points
  • Synthesizes bullet points into detailed articles
  • Supports various AI models for text generation
  • Model-agnostic design allows easy swapping of different LLMs

Requirements

  • Required: datasets, huggingface_hub
  • Optional: phi-3-vision-mlx (required only if you need to create a new dataset with the provided setup() function for custom dataset processing)

Installation

To install Dialektik with core dependencies only:

pip install dialektik

To install Dialektik with all dependencies, including those required for the setup() function:

pip install dialektik[setup]

Note: Install the full version if you plan to process custom datasets using the setup() function.

Setup

  1. Clone the repository:

    git clone https://github.com/JosefAlbersç/Dialektik.git
    cd Dialektik
    
  2. Install the required dependencies:

    pip install -r requirements.txt
    

Usage

Command Line Interface

Dialektik can be used from the command line after installation. Here are some example usages:

  1. Generate a synthesis with default settings:

    dialektik
    
  2. Specify sources:

    dialektik --source arxiv
    
  3. Set the number of bullet points per book and choose a different model:

    dialektik --per-book 5 --model "your-preferred-model"
    
  4. Run the setup function:

    dialektik --setup
    
  5. For a full list of options, use:

    dialektik --help
    

Accessing the Dataset

Important Note: The default dataset at 'JosefAlbers/StampyAI-alignment-research-dataset' is currently being prepared (ETA: 18 hours). Please check back later if unavailable.

The default dataset is to be publicly available. You do not need to set up any environment variables or run the setup() function to use dialektik with this dataset.

Synthesizing content

To generate a synthesis, simply run:

from dialektik import synthesize

output = synthesize()

You can customize the synthesis process by passing optional parameters:

output = synthesize(
   list_source=['your_source'],
   per_book=3,
   api_model="mistralai/Mistral-Nemo-Instruct-2407"
)

(Optional) Using Custom Datasets

If you want to use your own dataset:

  1. Prepare your dataset according to the required format.
  2. Modify the PATH_DS variable in the code to point to your dataset.
  3. If your dataset is private or requires authentication, set up the following environment variables:
    • HF_WRITE_TOKEN: Hugging Face write token (for pushing datasets)
    • HF_READ_TOKEN: Hugging Face read token (for accessing private datasets)

Note: The setup() function provided in the code is a demonstration of how you might process a custom dataset. Different datasets may require different processing steps, so you'll need to adapt this function to your specific needs.

Customizing the LLM

Dialektik is designed to be model-agnostic. To use a different language model:

  1. Simply pass the name of your chosen model to the synthesize() function using the api_model parameter.
  2. Modify the mistral_api() function or create a new function that interfaces with your chosen LLM.
  3. Update the synthesize() function to use your new LLM interface.

The default model is "mistralai/Mistral-Nemo-Instruct-2407", but you can easily change this by passing a different api_model parameter to the synthesize() function.

License

This project is licensed under the MIT License.

Citation

DOI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dialektik-0.0.1a0.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

dialektik-0.0.1a0-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file dialektik-0.0.1a0.tar.gz.

File metadata

  • Download URL: dialektik-0.0.1a0.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for dialektik-0.0.1a0.tar.gz
Algorithm Hash digest
SHA256 02bddf98e5b21d1c91f474efbc16c8afd4ed7da8630e33cac98a3fba30c08da9
MD5 78efca994c73226c2377cde20edd71a5
BLAKE2b-256 c86ba84f0426f040e16c608b8467fb4a42436921aef0700a7ac0818bb1cf7374

See more details on using hashes here.

File details

Details for the file dialektik-0.0.1a0-py3-none-any.whl.

File metadata

  • Download URL: dialektik-0.0.1a0-py3-none-any.whl
  • Upload date:
  • Size: 5.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for dialektik-0.0.1a0-py3-none-any.whl
Algorithm Hash digest
SHA256 2a62a6c8ba4a01bcecadce3bf1325d1a183713602f6780c3c21fcb3c1897c603
MD5 e911aefe649df2ed2b8c355c1bef3669
BLAKE2b-256 7c12bdd302d0b445467672fcaf0db196ee2cc1f1b5df714bc15017dcf27d8fd1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page