Skip to main content

A library to standardize the usage of various machine learning models

Project description

Neural_sync Library

Overview

The Neural_sync Library provides a unified interface for working with different machine learning models across various tasks. This library aims to standardize the way models are loaded, parameters are set, and results are generated, enabling a consistent approach regardless of the model type.

Supported Models

Text-to-Speech (TTS)

  • FastPitch & HiFi-GAN: Convert text to high-quality speech audio.

Speech-to-Text (STT)

  • Distil-Large-V2: Transcribe speech to text.
  • Openai/whisper-large-v3: Transcribe speech to text.
  • Nemo_asr: Transcribe speech to text.

Speaker Diarization

  • Pyannote 3.1: Identify and separate speakers in an audio file.

Voice Activity Detection (VAD)

  • Silero VAD: Detect speech segments in audio files.

Text-to-Image Generation

  • Stable Diffusion Medium-3: Generate images from text prompts.

Transformers-based Models

  • llama2, llama3, llama3_1
  • Mistralv2, Mistralv3
  • Phi3.5 Mini
  • AgentLM 7b

Quantized Models

  • LLaMA 2, LLaMA 3, LLaMA 3.1
  • Mistral v2, Mistral v3
  • Phi3.5 Mini
  • AgentLM 7b

Installation

To install the library and its dependencies, use the following command:

pip install -r requirements.txt

Usage Examples

1. Transformers

Transformers models are versatile and can be used for various NLP tasks. Here's an example using the LLaMA 3 model

from parent.factory import ModelFactory
model = ModelFactory.get_model("agentlm")  # No need to specify model_path
params = ModelFactory.load_params_from_json('parameters.json')
model.set_params(**params)
response = model.generate(
     prompt="What is Artificial Intelligence?",
     system_prompt="Answer in German."
 )
print(response)

Similarly use following string for other models of transformers:

  • agentlm
  • Phi3_5
  • llama2
  • llama3
  • llama3_1
  • Mistralv2
  • Mistralv3

2. FastPitch (Text-to-Speech)

FastPitch is used for generating speech from text:

from parent.factory import ModelFactory
model = ModelFactory.get_model("fastpitch")

response = model.generate(text="Hello, this is Hasan Maqsood",output_path="Hasan.wav")

3. Voice Activity Detection (VAD)

Silero VAD is used for detecting speech timestamps in audio files:

from parent.factory import ModelFactory
model = ModelFactory.get_model("silero_vad")

response = model.generate("Youtube.wav")
print("Speech Timestamps:", response)

4. Speaker Diarization

Pyannote is used for speaker diarization:

from parent.factory import ModelFactory
model = ModelFactory.get_model("pyannote",use_auth_token="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")
response = model.generate("Hasan.wav", visualize =True)

5. Automatic Speech Recognition (Speech-To-Text)

Nemo ASR is used for transcribing audio to text:

from parent.factory import ModelFactory
model = ModelFactory.get_model("nemo_asr")
response = model.generate(audio_files=["Hasan.wav"])
print(response)

6. Distil/ Openai whsiper (Speech-To-Text)

Distil-whisper is used for transcribing audio to text:

from parent.factory import ModelFactory
model = ModelFactory.get_model("whisper")
response = model.generate("Youtube.wav")
print("Transcription:", response)

Openai-whisper is also used for transcribing audio to text:

 from parent.factory import ModelFactory
 model = ModelFactory.get_model("openai-whisper")
 response = model.generate("Youtube.wav")
 print("Transcription:", response)

7. Text-to-Image Generation

Stable Diffusion is used for generating images from text prompts:

from parent.factory import ModelFactory
model = ModelFactory.get_model("sd_medium3")
response = model.generate(prompt ="House")
image_path = "new_house.png"
response.save(image_path)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neural_sync-0.1.1.tar.gz (21.3 kB view details)

Uploaded Source

Built Distribution

neural_sync-0.1.1-py3-none-any.whl (3.6 kB view details)

Uploaded Python 3

File details

Details for the file neural_sync-0.1.1.tar.gz.

File metadata

  • Download URL: neural_sync-0.1.1.tar.gz
  • Upload date:
  • Size: 21.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for neural_sync-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d33a7f622aa79572bb7e0e7ba957b2202f99d6e4ec598be1610c2c21679e34b4
MD5 5de7771b2435f12a5da8c1f332b3ec48
BLAKE2b-256 20d71da7bf93b13e154cab94e63a784c3b0b9725114dc96b4e32e970b4b6c422

See more details on using hashes here.

File details

Details for the file neural_sync-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: neural_sync-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 3.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for neural_sync-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2cafcd978f449587c2727d4d02795d614cf0c07be7a320fc074852646df3c7a3
MD5 4cefe0d8c0ae3b5af55771021f6ffe34
BLAKE2b-256 8f97a3ec1bec826da8e1a67016490fb34629fb285c71b62cfd66a075ac8d32f3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page