Multinear platform

Project description

Multinear: A Platform for Developing and Testing GenAI Applications

Multinear is a platform designed to aid in the development of Generative AI applications by running experiments, measuring results, and providing insights. It allows developers to run their GenAI-powered workflows with various configurations, collect metadata, and analyze outcomes to build reliable and robust applications.

Features
Getting Started
Usage
Analyzing Results
Architecture
Contributing
License

Features

Experiment Workflow: Run experiments with different configurations of models, prompts, datasets, and business logic.
Result Tracking: Automatically saves metadata and results of each experiment for analysis.
Regression Detection: Identify regressions when new changes impact previously working cases.
Evaluation Framework: Supports various evaluation methods including direct comparison, LLM-as-a-judge, and human evaluation.
Comprehensive Insights: Compare results across runs, visualize performance trends, and understand the impact of changes.
Security Testing: Evaluate your application against malicious inputs, guardrails, and safety measures.

Getting Started

Installation

To install Multinear and its dependencies, run:

git clone https://github.com/multinear/multinear.git
cd multinear
make install

This will install the required Python packages.

Project Initialization

Initialize a new Multinear project in your desired directory:

multinear init

You will be prompted to enter your project details:

Project name: The name of your project.
Project ID: A URL-friendly identifier for your project (default provided).
Project description: A brief description of your project.

This command creates a .multinear directory containing your project configuration.

Running the Platform

Start the Multinear web server:

multinear web

By default, the server runs on http://127.0.0.1:8000. You can access the frontend interface in your browser to interact with the platform.

For development mode with auto-reload on file changes:

multinear web_dev

Usage

Defining Your Task Runner

Create a task_runner.py in the .multinear directory of your project. This file defines the run_task(input) function, which contains the logic for processing each task.

Example task_runner.py:

def run_task(input):
    # Your GenAI-powered application logic here
    output = my_application.process(input)
    details = {'model': 'gpt-4o'}
    return {'output': output, 'details': details}

Configuring Tasks and Evaluations

Define your tasks and evaluation criteria in .multinear/config.yaml.

Example config.yaml:

project:
  id: my-genai-project
  name: My GenAI Project
  description: Experimenting with GenAI models

tasks:
  - id: task1
    input: "Input data for task 1"
    min_score: 0.8
    checklist:
      - "The output should be in English."
      - "The response should be polite."
  - id: task2
    input: "Input data for task 2"
    min_score: 1.0
    checklist:
      - "The output should include at least two examples."
      - "The response should be less than 500 words."

Running Experiments

You can run experiments either through the command line interface (CLI) or the web frontend.

Using the CLI

Run an experiment using the run command:

multinear run

This will:

Start a new experiment run
Show real-time progress with a progress bar
Display current status and results
Save detailed output to .multinear/last_output.txt

View recent experiment results:

multinear recent

Get detailed information about a specific run:

multinear details <run-id>

Using the Frontend

Start the web server if not already running:

multinear web

Open http://127.0.0.1:8000 in your browser
Click "Run Experiment" to start an experiment

The frontend provides:

Real-time progress tracking
Interactive results visualization
Detailed task-level information
Ability to compare multiple runs

Analyzing Results

Once the experiment run is complete, you can analyze the results via the frontend dashboard. The platform provides:

Run Summaries: Overview of each experiment run, including total tasks, passed/failed counts, and overall score.
Detailed Reports: Drill down into individual tasks to see input, output, logs, and evaluation details.
Trend Analysis: Compare results across runs to identify improvements or regressions.
Filter and Search: Find specific tasks or runs based on criteria such as challenge ID, date, or status.

Architecture

Multinear consists of several components:

CLI Tool (cli/main.py): Command-line interface for initializing projects and starting the web server.
Web Server (main.py): A FastAPI application serving API endpoints and static Svelte frontend files.
Engine (engine/ Directory):
- Run Management (run.py): Handles execution of tasks and evaluation.
- Storage (storage.py): Manages data models and database operations using SQLAlchemy.
- Evaluation (evaluate.py, checklist.py): Provides evaluation mechanisms for task outputs.
API (api/ Directory): Defines API routes and schemas for interaction with the frontend.
Utilities (utils/capture.py): Captures task execution output and logs.
Frontend: A Svelte-based interface for interacting with the platform (located in multinear/frontend/).

Contributing

Contributions are welcome! To contribute:

Fork the repository.
Create a new branch for your feature or bugfix.
Make your changes with clear commit messages.
Submit a pull request to the main branch.

Please ensure that your code adheres to the project's coding standards and passes all tests.

License

Multinear is released under the MIT License. You are free to use, modify, and distribute this software as per the terms of the license.

Project details

Release history Release notifications | RSS feed

0.1.8

Apr 17, 2025

0.1.7

Apr 15, 2025

0.1.6

Mar 24, 2025

0.1.5

Mar 6, 2025

0.1.4

Feb 18, 2025

0.1.3

Jan 12, 2025

0.1.1

Nov 27, 2024

This version

0.1.0

Nov 26, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multinear-0.1.0.tar.gz (585.1 kB view details)

Uploaded Nov 26, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

multinear-0.1.0-py3-none-any.whl (673.4 kB view details)

Uploaded Nov 26, 2024 Python 3

File details

Details for the file multinear-0.1.0.tar.gz.

File metadata

Download URL: multinear-0.1.0.tar.gz
Upload date: Nov 26, 2024
Size: 585.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.8.20

File hashes

Hashes for multinear-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2d0a74a656cbab3f51e6cd28c7804b48a172b093e7e5453e6bdf3d08979c555c`
MD5	`9570942431aadf1d47bc50e7fb236866`
BLAKE2b-256	`5aaab1e7eb1cc00586a5345bffc500fbd40560cbfe6d2524207dd313743aee89`

See more details on using hashes here.

File details

Details for the file multinear-0.1.0-py3-none-any.whl.

File metadata

Download URL: multinear-0.1.0-py3-none-any.whl
Upload date: Nov 26, 2024
Size: 673.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.8.20

File hashes

Hashes for multinear-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`edacdc9a562c70725ce0936d6676312dd157f526f18bc7f283562c3b2ee67f13`
MD5	`cccb8e294df8f06b5b5a49edcf58df6e`
BLAKE2b-256	`47f361afbc94e624da1c294d71d42ea47689f609c452317751f6296115a753b2`

See more details on using hashes here.

multinear 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Multinear: A Platform for Developing and Testing GenAI Applications

Table of Contents

Features

Getting Started

Installation

Project Initialization

Running the Platform

Usage

Defining Your Task Runner

Configuring Tasks and Evaluations

Running Experiments

Using the CLI

Using the Frontend

Analyzing Results

Architecture

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes