A library to standardize the usage of various machine learning models
Project description
Neural_sync Library
Overview
The Neural_sync Library provides a unified interface for working with different machine learning models across various tasks. This library aims to standardize the way models are loaded, parameters are set, and results are generated, enabling a consistent approach regardless of the model type.
Supported Models
Text-to-Speech (TTS)
- FastPitch & HiFi-GAN: Convert text to high-quality speech audio.
Speech-to-Text (STT)
- Distil-Large-V2: Transcribe speech to text.
- Openai/whisper-large-v3: Transcribe speech to text.
- Nemo_asr: Transcribe speech to text.
Speaker Diarization
- Pyannote 3.1: Identify and separate speakers in an audio file.
Voice Activity Detection (VAD)
- Silero VAD: Detect speech segments in audio files.
Text-to-Image Generation
- Stable Diffusion Medium-3: Generate images from text prompts.
Transformers-based Models
- llama2, llama3, llama3_1
- Mistralv2, Mistralv3
- Phi3.5 Mini
- AgentLM 7b
Quantized Models
- LLaMA 2, LLaMA 3, LLaMA 3.1
- Mistral v2, Mistral v3
- Phi3.5 Mini
- AgentLM 7b
Installation
To install the library and its dependencies, use the following command:
pip install -r requirements.txt
Usage Examples
1. Transformers
Transformers models are versatile and can be used for various NLP tasks. Here's an example using the LLaMA 3 model
from parent.factory import ModelFactory
model = ModelFactory.get_model("agentlm") # No need to specify model_path
params = ModelFactory.load_params_from_json('parameters.json')
model.set_params(**params)
response = model.generate(
prompt="What is Artificial Intelligence?",
system_prompt="Answer in German."
)
print(response)
Similarly use following string for other models of transformers:
- agentlm
- Phi3_5
- llama2
- llama3
- llama3_1
- Mistralv2
- Mistralv3
2. FastPitch (Text-to-Speech)
FastPitch is used for generating speech from text:
from parent.factory import ModelFactory
model = ModelFactory.get_model("fastpitch")
response = model.generate(text="Hello, this is Hasan Maqsood",output_path="Hasan.wav")
3. Voice Activity Detection (VAD)
Silero VAD is used for detecting speech timestamps in audio files:
from parent.factory import ModelFactory
model = ModelFactory.get_model("silero_vad")
response = model.generate("Youtube.wav")
print("Speech Timestamps:", response)
4. Speaker Diarization
Pyannote is used for speaker diarization:
from parent.factory import ModelFactory
model = ModelFactory.get_model("pyannote",use_auth_token="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")
response = model.generate("Hasan.wav", visualize =True)
5. Automatic Speech Recognition (Speech-To-Text)
Nemo ASR is used for transcribing audio to text:
from parent.factory import ModelFactory
model = ModelFactory.get_model("nemo_asr")
response = model.generate(audio_files=["Hasan.wav"])
print(response)
6. Distil/ Openai whsiper (Speech-To-Text)
Distil-whisper is used for transcribing audio to text:
from parent.factory import ModelFactory
model = ModelFactory.get_model("whisper")
response = model.generate("Youtube.wav")
print("Transcription:", response)
Openai-whisper is also used for transcribing audio to text:
from parent.factory import ModelFactory
model = ModelFactory.get_model("openai-whisper")
response = model.generate("Youtube.wav")
print("Transcription:", response)
7. Text-to-Image Generation
Stable Diffusion is used for generating images from text prompts:
from parent.factory import ModelFactory
model = ModelFactory.get_model("sd_medium3")
response = model.generate(prompt ="House")
image_path = "new_house.png"
response.save(image_path)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file neural_sync-0.1.1.tar.gz
.
File metadata
- Download URL: neural_sync-0.1.1.tar.gz
- Upload date:
- Size: 21.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d33a7f622aa79572bb7e0e7ba957b2202f99d6e4ec598be1610c2c21679e34b4 |
|
MD5 | 5de7771b2435f12a5da8c1f332b3ec48 |
|
BLAKE2b-256 | 20d71da7bf93b13e154cab94e63a784c3b0b9725114dc96b4e32e970b4b6c422 |
File details
Details for the file neural_sync-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: neural_sync-0.1.1-py3-none-any.whl
- Upload date:
- Size: 3.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2cafcd978f449587c2727d4d02795d614cf0c07be7a320fc074852646df3c7a3 |
|
MD5 | 4cefe0d8c0ae3b5af55771021f6ffe34 |
|
BLAKE2b-256 | 8f97a3ec1bec826da8e1a67016490fb34629fb285c71b62cfd66a075ac8d32f3 |