Skip to main content

A library to standardize the usage of various machine learning models

Project description

Neural_sync Library

Overview

The Neural_sync Library provides a unified interface for working with different machine learning models across various tasks. This library aims to standardize the way models are loaded, parameters are set, and results are generated, enabling a consistent approach regardless of the model type.

Supported Models

Text-to-Speech (TTS)

  • FastPitch & HiFi-GAN: Convert text to high-quality speech audio.

Speech-to-Text (STT)

  • Distil-Large-V2: Transcribe speech to text.
  • Openai/whisper-large-v3: Transcribe speech to text.
  • Nemo_asr: Transcribe speech to text.

Speaker Diarization

  • Pyannote 3.1: Identify and separate speakers in an audio file.

Voice Activity Detection (VAD)

  • Silero VAD: Detect speech segments in audio files.

Text-to-Image Generation

  • Stable Diffusion Medium-3: Generate images from text prompts.

Transformers-based Models

  • llama2, llama3, llama3_1
  • Mistralv2, Mistralv3
  • Phi3.5 Mini
  • AgentLM 7b

Quantized Models

  • LLaMA 2, LLaMA 3, LLaMA 3.1
  • Mistral v2, Mistral v3
  • Phi3.5 Mini
  • AgentLM 7b

Installation

To install the library and its dependencies, use the following command:

pip install -r requirements.txt

Usage Examples

1. Transformers

Transformers models are versatile and can be used for various NLP tasks. Here's an example using the LLaMA 3 model

from parent.factory import ModelFactory
model = ModelFactory.get_model("agentlm")  # No need to specify model_path
params = ModelFactory.load_params_from_json('parameters.json')
model.set_params(**params)
response = model.generate(
     prompt="What is Artificial Intelligence?",
     system_prompt="Answer in German."
 )
print(response)

Similarly use following string for other models of transformers:

  • agentlm
  • Phi3_5
  • llama2
  • llama3
  • llama3_1
  • Mistralv2
  • Mistralv3

2. FastPitch (Text-to-Speech)

FastPitch is used for generating speech from text:

from parent.factory import ModelFactory
model = ModelFactory.get_model("fastpitch")

response = model.generate(text="Hello, this is Hasan Maqsood",output_path="Hasan.wav")

3. Voice Activity Detection (VAD)

Silero VAD is used for detecting speech timestamps in audio files:

from parent.factory import ModelFactory
model = ModelFactory.get_model("silero_vad")

response = model.generate("Youtube.wav")
print("Speech Timestamps:", response)

4. Speaker Diarization

Pyannote is used for speaker diarization:

from parent.factory import ModelFactory
model = ModelFactory.get_model("pyannote",use_auth_token="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")
response = model.generate("Hasan.wav", visualize =True)

5. Automatic Speech Recognition (Speech-To-Text)

Nemo ASR is used for transcribing audio to text:

from parent.factory import ModelFactory
model = ModelFactory.get_model("nemo_asr")
response = model.generate(audio_files=["Hasan.wav"])
print(response)

6. Distil/ Openai whsiper (Speech-To-Text)

Distil-whisper is used for transcribing audio to text:

from parent.factory import ModelFactory
model = ModelFactory.get_model("whisper")
response = model.generate("Youtube.wav")
print("Transcription:", response)

Openai-whisper is also used for transcribing audio to text:

 from parent.factory import ModelFactory
 model = ModelFactory.get_model("openai-whisper")
 response = model.generate("Youtube.wav")
 print("Transcription:", response)

7. Text-to-Image Generation

Stable Diffusion is used for generating images from text prompts:

from parent.factory import ModelFactory
model = ModelFactory.get_model("sd_medium3")
response = model.generate(prompt ="House")
image_path = "new_house.png"
response.save(image_path)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neural_sync-0.1.0.tar.gz (3.7 kB view details)

Uploaded Source

Built Distribution

neural_sync-0.1.0-py3-none-any.whl (3.6 kB view details)

Uploaded Python 3

File details

Details for the file neural_sync-0.1.0.tar.gz.

File metadata

  • Download URL: neural_sync-0.1.0.tar.gz
  • Upload date:
  • Size: 3.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for neural_sync-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cf0cb5c27442baa41e807e167ab40dfc747ac1c685384ae22ce55e5c96ccc3b3
MD5 fdda7ae2ce55421cef3d7e36f102918f
BLAKE2b-256 fe81cd88243d1e127fd46d15001b5885e25bb800cd66b0cd2e582f58649ac144

See more details on using hashes here.

File details

Details for the file neural_sync-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: neural_sync-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 3.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for neural_sync-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 01b24a001558e154bafad8df795dc08470bb8ff8559b373cadb10aa1a83aaa9e
MD5 089023492d4c9e62b2ced29de40da414
BLAKE2b-256 3beb01ca1c8db03722176726e6e653594846fea8bccde52c78cc8b5f18c7ac39

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page