Skip to main content

A library to standardize the usage of various machine learning models

Project description

Neural_sync Library

Overview

The Neural_sync Library provides a unified interface for working with different machine learning models across various tasks. This library aims to standardize the way models are loaded, parameters are set, and results are generated, enabling a consistent approach regardless of the model type.

Supported Models

Text-to-Speech (TTS)

  • FastPitch & HiFi-GAN: Convert text to high-quality speech audio.

Speech-to-Text (STT)

  • Distil-Large-V2: Transcribe speech to text.
  • Openai/whisper-large-v3: Transcribe speech to text.
  • Nemo_asr: Transcribe speech to text.

Speaker Diarization

  • Pyannote 3.1: Identify and separate speakers in an audio file.

Voice Activity Detection (VAD)

  • Silero VAD: Detect speech segments in audio files.

Text-to-Image Generation

  • Stable Diffusion Medium-3: Generate images from text prompts.

Transformers-based Models

  • llama2, llama3, llama3_1
  • Mistralv2, Mistralv3
  • Phi3.5 Mini
  • AgentLM 7b

Quantized Models

  • LLaMA 2, LLaMA 3, LLaMA 3.1
  • Mistral v2, Mistral v3
  • Phi3.5 Mini
  • AgentLM 7b

Usage Examples

1. Transformers

Transformers models are versatile and can be used for various NLP tasks. Here's an example using the LLaMA 3 model

from parent.factory import ModelFactory
model = ModelFactory.get_model("llama3_1_8b_instruct")  # No need to specify model_path
params = ModelFactory.load_params_from_json('parameters.json')
model.set_params(**params)
response = model.generate(
     prompt="What is Artificial Intelligence?",
     system_prompt="Answer in German."
 )
print(response)

Similarly use following string for other models of transformers:

  • agentlm
  • Phi3_5
  • llama2
  • llama3_8b_instruct
  • llama3_1_8b_instruct
  • Mistralv2
  • Mistralv3

2. FastPitch (Text-to-Speech)

FastPitch is used for generating speech from text:

from parent.factory import ModelFactory
model = ModelFactory.get_model("fastpitch")

response = model.generate(text="Hello, this is Hasan Maqsood",output_path="Hasan.wav")

3. Voice Activity Detection (VAD)

Silero VAD is used for detecting speech timestamps in audio files:

from parent.factory import ModelFactory
model = ModelFactory.get_model("silero_vad")

response = model.generate("Youtube.wav")
print("Speech Timestamps:", response)

4. Speaker Diarization

Pyannote is used for speaker diarization:

from parent.factory import ModelFactory
model = ModelFactory.get_model("pyannote",use_auth_token="Enter Your authentication token")
response = model.generate("Hasan.wav", visualize =True)

5. Automatic Speech Recognition (Speech-To-Text)

Nemo ASR is used for transcribing audio to text:

from parent.factory import ModelFactory
model = ModelFactory.get_model("nemo_asr")
response = model.generate(audio_files=["Hasan.wav"])
print(response)

6. Distil/ Openai whsiper (Speech-To-Text)

Distil-whisper is used for transcribing audio to text:

from parent.factory import ModelFactory
model = ModelFactory.get_model("distil_whisper")
response = model.generate("Youtube.wav")
print("Transcription:", response)

Openai-whisper is also used for transcribing audio to text:

 from parent.factory import ModelFactory
 model = ModelFactory.get_model("openai_whisper")
 response = model.generate("Youtube.wav")
 print("Transcription:", response)

7. Text-to-Image Generation

Stable Diffusion is used for generating images from text prompts:

from parent.factory import ModelFactory
model = ModelFactory.get_model("sd_medium3")
response = model.generate(prompt ="House")
image_path = "new_house.png"
response.save(image_path)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neural_sync-0.1.2.tar.gz (20.4 kB view details)

Uploaded Source

Built Distribution

neural_sync-0.1.2-py3-none-any.whl (3.5 kB view details)

Uploaded Python 3

File details

Details for the file neural_sync-0.1.2.tar.gz.

File metadata

  • Download URL: neural_sync-0.1.2.tar.gz
  • Upload date:
  • Size: 20.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for neural_sync-0.1.2.tar.gz
Algorithm Hash digest
SHA256 6cbed0e9ee81b19f0ca611e62e47dc793a0fc42701f9c3dd429cc4f3a93180d0
MD5 e0ecfc3f23b63de7135a50b314bf8861
BLAKE2b-256 8af0a06cf587a203fea5b12a9a62505c62e01aec664486b769f1e0cc29dd379e

See more details on using hashes here.

File details

Details for the file neural_sync-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: neural_sync-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 3.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for neural_sync-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c2a8d3af90826d324e8f6a53bb8fff91271c1243acab7af274d6163b328d4a1f
MD5 db906c770a48b09a5ca6d310ff15978b
BLAKE2b-256 f3e39ae279208cc905bb4c16243821e05d0a76cf851210eda5013235f1ee35eb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page