Skip to main content

No project description provided

Project description

🌸 Nagato

Nagato is a framework that enables any developer to streamline the creation of fine-tuned embedding and language models specifically tailored to a given corpus of data

GitHub Contributors GitHub Last Commit GitHub Issues GitHub Pull Requests Github License Discord


Quick Start GuideFeaturesKey benefitsHow it works


Features

  • Data ingestion from various formats such as JSON, CSV, TXT, PDF, etc.
  • Data embedding using pre-trained or finetuned models.
  • Storage of embedded vectors
  • Automatic generation of question/answer pairs for model finetuning
  • Built in code interpreter
  • API concurrency for scalalbility and performance
  • Workflow management for ingestion pipelines

Key benefits

  • Faster inference: Generic models often bring overhead in terms of computational time due to their broad-based training. In contrast, our fine-tuned models are optimized for specific domains, enabling faster inference and more timely results.

  • Lower costs: Utilizing fine-tuned models tailored for a specific corpus minimizes the number of tokens needed for accurate understanding and response generation. This reduction in token count translates to decreased computational costs and thus lower operational expenses.

  • Better results: Fine-tuned models offer superior performance on specialized tasks when compared to generic, all-purpose models. Whether you're generating embeddings or answering complex queries, you can expect more accurate and contextually relevant outcomes.

How it works

Nagato utilizes distinct strategies to process structured and unstructured data, aiming to produce fine-tuned models for both types. Below is a breakdown of how this is accomplished:

Untitled-2023-10-01-2152

Unstructured data:

  1. Selection of Embedding Model: The first step involves a careful analysis of the textual content to select an appropriate text-based embedding model. Based on various characteristics of the corpus such as vocabulary, context, and domain-specific jargon, Nagato picks the most suitable pre-trained text-based model for embedding.

  2. Fine-Tuning the Embedding Model: Once the initial text-based model is selected, it is then fine-tuned to align more closely with the specific domain or subject matter of the corpus. This ensures that the embeddings generated are as accurate and relevant as possible.

  3. Fine-Tuning the Language Model: After generating and storing embeddings, Nagato creates question-answer pairs for the purpose of fine-tuning a GPT-based language model. This yields a language model that is highly specialized in understanding and generating text within the domain of the corpus.

Structured data:

  1. Sandboxed REPL: Nagato features a secure, sandboxed Read-Eval-Print Loop (REPL) environment to execute code snippets against the structured text data. This facilitates flexible and dynamic processing of structured data formats like JSON, CSV or XML.

  2. Evaluation/Prediction Using a Code Interpreter: Post-initial processing, a code interpreter evaluates various code snippets within the sandboxed environment to produce predictions or analyses based on the structured text data. This capability allows the extraction of highly specialized insights tailored to the domain or subject matter.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nagato_ai-0.0.12.tar.gz (10.6 kB view details)

Uploaded Source

Built Distribution

nagato_ai-0.0.12-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file nagato_ai-0.0.12.tar.gz.

File metadata

  • Download URL: nagato_ai-0.0.12.tar.gz
  • Upload date:
  • Size: 10.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.4 Darwin/22.4.0

File hashes

Hashes for nagato_ai-0.0.12.tar.gz
Algorithm Hash digest
SHA256 b4c12e39150cfe1d59348ac27c10620f81959261969860527a82c9ade41889e1
MD5 ad214ad3da9efdb590fe6464f56966f9
BLAKE2b-256 2a83449bbbf03a2c75f6412c8f40570db464dd9d8e8bf525bb307de01d2927c1

See more details on using hashes here.

File details

Details for the file nagato_ai-0.0.12-py3-none-any.whl.

File metadata

  • Download URL: nagato_ai-0.0.12-py3-none-any.whl
  • Upload date:
  • Size: 10.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.4 Darwin/22.4.0

File hashes

Hashes for nagato_ai-0.0.12-py3-none-any.whl
Algorithm Hash digest
SHA256 77397857397f670deec753a94d88f827ceb405b4a1028cbea881b23c9533bbcb
MD5 b87068e7b8968f9d32f1d6e8c47df095
BLAKE2b-256 4283cd512713fc4fe57019660890d764a6b25e7e3e9c0e24072cd9b0e0d6d36f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page