Few Shot Language Agnostic Keyword Spotting (FSLAKWS) System
Project description
VoxVersa: Few Shot Language Agnostic Keyword Spotting (FSLAKWS) System
Overview
VoxVersa is an advanced system designed to efficiently detect and classify keywords across multiple languages using few training samples per keyword. The system leverages cutting-edge meta-learning techniques and audio signal processing to create a flexible, scalable, and adaptable keyword spotting model that works across diverse linguistic environments.
The system processes audio at various sample rates (8k-48k) and is capable of quickly learning new keywords and adapting to different audio conditions, making it highly effective for applications in voice-controlled technologies, multilingual customer service, and more.
Features
- Few-Shot Learning: Efficient detection and classification of keywords using very few training samples.
- Language Agnostic: Capable of handling keywords in multiple languages without requiring extensive language-specific training data.
- Audio Flexibility: Processes audio at multiple sample rates (8kHz to 48kHz).
- Meta-Learning: Uses model-agnostic meta-learning techniques for rapid adaptation to new keywords and environments.
- On-Device Processing: Enhances user privacy and security by enabling on-device processing.
Technologies Used
- Programming Language: Python
- Framework: PyTorch
Installation
To set up the environment for VoxVersa, follow the steps below:
-
Clone the repository:
git clone https://github.com/Kou-shik2004/SIH-2024.git cd SIH-2024
-
Install dependencies: Create a virtual environment and install the required Python packages:
python3 -m venv venv source venv/bin/activate # For Windows: venv\Scripts\activate pip install -r requirements.txt
-
Start with the project:
python setup.py install
Usage
Once the environment is set up, you can start training the model on your dataset or testing it on new audio samples.
1. Training the Model
To train the model using a custom dataset, use the following command:
python test_model.py
2. Inference
To get inference from the model:
python inference.py
Customizing for Your Own Few-Shot Data
To train the model on your own few-shot data and use it for inference, you'll need to make changes to the test_model.py and inference.py files. Here are specific instructions based on the current implementation:
Modifying test_model.py:
- Update the support set:
- Replace the file paths in
support_exampleswith your own audio files. - Update the
classeslist with your own keyword classes. - Adjust the
int_indicesif necessary.
- Replace the file paths in
support_examples = ["./your_clips/keyword1.wav", "./your_clips/keyword2.wav", ...]
classes = ["keyword1", "keyword2", ...]
int_indices = [0, 1, 2, ...]
- Modify the model loading if needed:
- Change the
encoder_nameorlanguageparameters to match your use case.
- Change the
fws_model = model.load(encoder_name="your_encoder", language="your_language", device="cpu")
- Adjust audio processing parameters if necessary:
- Modify
sample_rateandframes_per_bufferto match your audio data.
- Modify
Modifying inference.py:
- Update the support set:
- Replace the file paths in
support["paths"]with your own audio files. - Update the
support["classes"]list with your own keyword classes. - Adjust the
support["labels"]tensor if necessary.
- Replace the file paths in
support = {
"paths": ["./your_clips/keyword1.wav", "./your_clips/keyword2.wav", ...],
"labels": torch.tensor([0, 1, 2, ...]),
"classes": ["keyword1", "keyword2", ...],
}
- Modify the model loading if needed:
- Change the
encoder_nameorlanguageparameters to match your use case.
- Change the
fws_model = model.load(encoder_name="your_encoder", language="your_language", device="cpu")
- Adjust the query processing:
- If you're using different test clips, update the paths in the
querydictionary.
- If you're using different test clips, update the paths in the
query = {
"paths": ["./your_test_clips/query1.wav", "./your_test_clips/query2.wav"]
}
- Fine-tune the inference process:
- You may need to adjust the audio processing parameters or prediction threshold based on your specific use case.
Remember to thoroughly test your modifications to ensure they work correctly with your specific dataset and use case. You may also need to update the requirements.txt file if you introduce any new dependencies.
Running the Customized Model
After making the necessary modifications:
-
To train and test the model:
python test_model.py -
To run inference:
python inference.py
Make sure you have the required audio files in the correct directories before running these scripts.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voxws-1.0.1.tar.gz.
File metadata
- Download URL: voxws-1.0.1.tar.gz
- Upload date:
- Size: 5.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3cb8b5db5537f921718d2e46efc73e0c21fd82ffef35b8ee4da74693cf2134b3
|
|
| MD5 |
c92c6d448af1d1478c19b88b5ef6f1d1
|
|
| BLAKE2b-256 |
465db5f30652dc1bfb62ee3b673247060d0077b44c8212534c9ed02aa813f522
|
File details
Details for the file voxws-1.0.1-py3-none-any.whl.
File metadata
- Download URL: voxws-1.0.1-py3-none-any.whl
- Upload date:
- Size: 6.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b3c9a80acb487367add1f5096897ae681459706232c5061f1470239cbfd17b13
|
|
| MD5 |
3f092dbbba0b48aae276643a877f4653
|
|
| BLAKE2b-256 |
0430f78ae7432514d13099777596cc18c03dd44eab1f933b2480469d03f490f0
|